OCR
This section groups all OCR specific configurations.
DEFAULT_LANGUAGE
By default Papermerge will use language specified with this option to perform OCR. Change this value for language used by majority of your documents. For detailed list of three letter codes see 639-2/T column from ISO 639 2.
Example as environment variable:
PAPERMERGE__OCR__DEFAULT_LANGUAGE=spa
Example in toml configuration file:
[ocr]
default_language="spa"
Default value is "deu" (German language).
LANGUAGES
Note
This option may be defined only in toml configuration file
Defines all languages available for OCR. This option is defined as inline table where key is ISO 639 2 code and value is human text name for language.
Example:
[ocr]
languages = { heb = "hebrew", jpn = "japanese"}
Note that both hebrew
and japanes
language data for tesseract must be
installed. You can check Tesseract's available languages with following
command:
Important
languages
value must be written in one line! This is requirement
of the toml inline table
_ format.
List all available languages:
$ tesseract --list-langs
Default value:
[ocr]
languages = { deu = "Deutsch", eng = "English" }
See adding ocr language for detailed example of using this option.