Settings
Papermerge loads its settings from a configurations file. At first it tries to read following files:
/etc/papermerge.conf.py
papermerge.conf.py - from current project directory
If neither of above files exists it will check environment variable
PAPERMERGE_CONFIG_FILE
. In case environment variable
PAPERMERGE_CONFIG_FILE
points to an existing file - it will try to read
its configurations from there.
If all above attempts fail, Papermerge will use default configurations values and issue you a warning. If you want to get rid of warning message, just create an empty configuration file papermerge.conf.py in project root directory (right next to papermerge.conf.py.example) or in location /etc/papermerge.conf.py.
Configuration file uses python syntax.
Django Settings
Papermerge is based on Django Web Development Framework. This means basically that if you’ll know how Django projects are configured - you’ll be more familiar with how papermerge’s configuration internals. One particularly important thing to be aware of is the DJANGO_SETTINGS_MODULE environment variable - which is Django specific. Learn more about Django’s settings from Django documentation.
PAPERMERGE_ Prefix
There is slight difference where you place papermerge settings enumerated below. In short, when placed in papermerge.conf.py file, they don’t need PAPERMERGE_
prefix, while if you place very same configuration in django settings file - it needs PAPERMERGE_
prefix.
Papermerge settings can be either in:
papermerge.conf.py file
django settings file (the one referenced by DJANGO_SETTINGS_MODULE environment variable)
In papermerge.conf.py
file configuration settings are without
PAPERMERGE_
prefix, because all (well, 90%) of them are papermerge
specific. In django settings file however, there are all sort of settings -
for celery (prefixed with CELERY_
), for allauth (prefixed with ACCOUNT_
).
Respectively settings for specific for
papermerge are prefixed as well. Thus, any settings listed below, when added directly
to django settings file - needs PAPERMERGE_
prefix.
Configuration file papermerge.conf.py
is there for convenience. Most of the time you will need only that file.
Main App, Worker or Both?
Some configuration variables are for worker only (the part which OCRs the documents, imports documents form local directory or fetches them from imap/email account), some configuration variables are for main app only and some are for both. This distinction becomes aparent in case you deploy main app and worker on separate hosts; another scenario when this distinction is important in case of containerized deployment via docker - it so, because usually main app and worker will run in different containers - and thus will have different copies of papermerge.conf.py file.
The settings below specify for whom configuration settings is addressed. When
it says: “context: worker
” - it means variable applies only in context
of worker i.e. it needs to be changed in papermerge.conf.py
on worker
instance/host/container.
When settings description states “context: main app, worker
” - it means
configuration needs to be changed on both - main app and worker in order to
function properly.
Some of the most used configurations which you might be interest in:
MEDIA_DIR - location where all uploaded/imported documents are stored
OCR Languages Support - user can select one of those languages to perform OCR
OCR_DEFAULT_LANGUAGE - default language for OCR
Paths and Folders
DBDIR
/path/to/papermerge/sqlite/db/
context:
main app
Defines location where db.sqlite3 will be saved. By default uses project’s local directory.
Example:
DBDIR = "/opt/papermerge/db/"
MEDIA_DIR
/path/to/media/
context:
main app, worker
Defines directory where all uploaded documents will be stored.
By default uses a folder named media
in project’s local directory.
STATIC_DIR
/path/to/collected/static/assets/
context:
main app
Location where all static assets of the project Papermerge project (javascript files, css files) will be copied by ./manage collectstatic
command.
By default uses a folder named static in project’s local directory.
Example:
STATIC_DIR = "/opt/papermerge/static/"
Document Importer
Importer is a command line utility, which you can invoke with ./manage.py importer
, used to import all documents
from local directory.
IMPORTER_DIR
/path/where/documents/will/be/imported/from/
context:
worker
Location on local file system where Papermerge will try to import documents from.
Example:
IMPORTER_DIR = “/opt/papermerge/import/”
OCR
OCR_LANGUAGES
context:
main app, worker
Addinational languages for text OCR. A dictionary where key is ISO 639-2/T code and value human text name for language
Example:
OCR_LANGUAGES = {
'heb': 'hebrew',
'jpn': 'japanese'
}
Note that both hebrew and japanes language data for tesseract must be installed. You can check Tesseract’s available languages with following command:
$ tesseract --list-langs
Default value for OCR_LANGUAGES uses following value:
OCR_LANGUAGES = {
"deu": "Deutsch", # German language
"eng": "English",
}
OCR_DEFAULT_LANGUAGE
context:
main app, worker
By default Papermerge will use language specified with this option to perform OCR. Change this value for language used by majority of your documents.
Example:
OCR_DEFAULT_LANGUAGE = "spa"
Default value is “deu” (German language).
I18n and Localization
LANGUAGE_CODE
context:
main app
This option specifies language of user interface. There are two options:
en - for user interface in English language
de - for user interface in German language
English is default fallback i.e. if you don’t specify anything
or specify unsupported language then English will be used.
Instead of en
you can use en-US
, en-UK
etc.
Instead of de
you can use de-DE
, de-AT
etc.
See here full least of all available language codes.
You can translate Papermerge to your own language.
Default value: en
LANGUAGE_FROM_AGENT
If is set to True, will use same language code as your Web Browser (agent) does. Browsers send ‘Accept-Language’ header with their locale. For more, read here.
If
True
- will override LANGUAGE_CODE option. This means that withLANGUAGE_FROM_AGENT=True
in whatever locale settings your Web Browser runs - same will be used by Papermerge instance.If
False
- language code specified in LANGUAGE_CODE option will be used and ‘Accept-Language’ header in browser will be ignored.
Default value: False
Database
By default, Papermerge uses SQLite3 database (which is a file located in DBDIR). Alternatively you can use a PostgreSQL or MySQL/MariaDB database. Following are options for PostgreSQL and MySQL/MariaDB database connections.
DBTYPE
context: main app
DB type (if different from SQLite3). For PostgreSQL database use one of following values:
pg
postgre
postgres
postgresql
For MySQL/MariaDB database (they share same database backend) use one of following values:
my
mysql
maria
mariadb
Example:
DBTYPE = "mysql"
DBUSER
context: main app
DB user used for database connection.
Example:
DBUSER = "john"
DBNAME
context: main app
Database name. Default value is papermerge.
DBHOST
context: main app
Database host. Default value is localhost.
DBPORT
context: main app
Database port. Port must be specified as integer number. No string quotes.
Example:
DBPORT = 5432
Default value is 5432 for PostgreSQL and 3306 for MySQL/MariaDB.
DBPASS
context: main app
Password for connecting to database Default value is empty string.
EMail
You can import documents directly from email/IMAP account. All EMail importer settings must be defined in papermerge.conf.py
on worker side. Read details about ingesting documents via IMAP account in document consumption chapter.
IMPORT_MAIL_HOST
context: worker
IMAP Server host.
IMPORT_MAIL_USER
context: worker
Email account/IMAP user. IMAP user needs read and write access to IMAP “INBOX” folder.
IMPORT_MAIL_PASS
context: worker
Email account/IMAP password.
IMPORT_MAIL_INBOX
context: worker
IMAP folder to read email from. Default value for this settings is “INBOX”.
IMPORT_MAIL_BY_USER
context: worker
Whether to allow users to receive in their inbox folder emails sent from their own email address. This capability of assigning attached documents to correct user’s inbox is called email routing and is described at length in One IMAP Account for Many Papermerge Users.
IMPORT_MAIL_BY_SECRET
context: worker
Whether to allow users to receive in their inbox folder emails containing their own secret. This capability of assigning attached documents to correct user’s inbox is called email routing and is described at length in One IMAP Account for Many Papermerge Users.
IMPORT_MAIL_DELETE
context: worker
Whether to delete emails after processing.
Binary Dependencies
Papermerge uses a number of open source 3rd parties for various purposes. One of the most obvious example is tesseract - used to OCR documents (extract text from binary image file). Another, less obvious example, is pdfinfo utility provided by poppler-utils package: pdfinfo is used to count number of pages in pdf document. Configurations listed below allow you to override path to specific dependency.
BINARY_OCR
context: worker
Full path to tesseract binary/executable file. Tesseract is used for OCR operations - extracting of text from binary image files (jpeg, png, tiff). Default value is:
BINARY_OCR = "/usr/bin/tesseract"
BINARY_FILE
context: main app, worker
File utility used to find out mime type of given file. Default value is:
BINARY_FILE = "/usr/bin/file"
BINARY_CONVERT
context: main app, worker
Convert utility is provided by ImageMagick package. It is used for resizing images. Default value is:
BINARY_CONVERT = "/usr/bin/convert"
BINARY_PDFTOPPM
context: main app, worker
Provided by Poppler Utils. Used to extract images from PDF file. Default value is:
BINARY_PDFTOPPM = "/usr/bin/pdftoppm"
BINARY_PDFINFO
context: main app, worker
Provided by Poppler Utils. Used to get page count in PDF file. Default value is:
BINARY_PDFINFO = "/usr/bin/pdfinfo"
BINARY_STAPLER
context: main app, worker
Provided by stapler. This external tool is used to reorder, cut/paste, delete pages within PDF document.
Default value is:
BINARY_STAPLER = "/usr/bin/stapler"
Depending on you system, and the way you installed stapler - you may want to
adjust BINARY_STAPLER
path.