Add OCR Languages
By default the Papermerge docker image includes English, German, French, Italian, Spanish, Dutch, Romanian and Portugues OCR languages.
You can install extra languages by creating a new docker image from base papermerge/papermerge
.
Create new docker file with following content:
FROM papermerge/papermerge:3.1
# add Danish and Polish OCR languages
RUN apt install tesseract-ocr-dan tesseract-ocr-pol
All languages are specified in three letters code as per ISO 639-2T standard - second column in the table.
In order to build your image run:
docker build -t mypaper:3.0 -f Dockerfile .
Check that OCR languages were installed:
docker run -it --rm mypaper:3.0 tesseract --list-langs