Skip to content

Add OCR Languages

By default the Papermerge docker image includes English, German, French, Italian, Spanish, Dutch, Romanian and Portugues OCR languages.

You can install extra languages by creating a new docker image from base papermerge/papermerge.

Create new docker file with following content:

FROM papermerge/papermerge:3.1

# add Danish and Polish OCR languages
RUN apt install tesseract-ocr-dan tesseract-ocr-pol

All languages are specified in three letters code as per ISO 639-2T standard - second column in the table.

In order to build your image run:

docker build -t mypaper:3.0 -f Dockerfile .

Check that OCR languages were installed:

docker run -it --rm  mypaper:3.0  tesseract --list-langs