Skip to content

OCR

This section groups all OCR specific configurations.

DEFAULT_LANGUAGE

By default Papermerge will use language specified with this option to perform OCR. Change this value for language used by majority of your documents. For detailed list of three letter codes see 639-2/T column from ISO 639 2.

Example as environment variable:

PAPERMERGE__OCR__DEFAULT_LANGUAGE=spa

Example in toml configuration file:

[ocr]
default_language="spa"

Default value is "deu" (German language).

LANGUAGES

Note

This option may be defined only in toml configuration file

Defines all languages available for OCR. This option is defined as inline table where key is ISO 639 2 code and value is human text name for language.

Example:

[ocr]
languages = { heb = "hebrew", jpn = "japanese"}

Note that both hebrew and japanes language data for tesseract must be installed. You can check Tesseract's available languages with following command:

Important

languages value must be written in one line! This is requirement of the toml inline table_ format.

List all available languages:

$ tesseract --list-langs

Default value:

[ocr]
languages = { deu = "Deutsch", eng = "English" }

See adding ocr language for detailed example of using this option.