Docker Compose
This section describes how to setup Papermerge DMS using docker compose. As mentioned in overview section, Papermerge DMS philosophy revolves around progressive setup concept. As such, you are advised to start with simple setups and progress, in small steps, towards more complicated scenarios. The idea is that on the way you will learn and understand better internals, which in turn will enable you to be build very creative deployments.
Database/PostgreSQL
By default, web app uses SQLite database. SQLite is great for quick demos, not for production environments.
Following docker compose file starts Papermerge DMS with PostgreSQL 16.1 database:
services:
webapp:
image: papermerge/papermerge:3.3b21
environment:
PAPERMERGE__SECURITY__SECRET_KEY: 12345
PAPERMERGE__AUTH__USERNAME: admin
PAPERMERGE__AUTH__PASSWORD: admin
PAPERMERGE__DATABASE__URL: postgresql://coco:jumbo@db:5432/pmgdb
ports:
- "12000:80"
depends_on:
- db
db:
image: postgres:16.1
volumes:
- pgdata:/var/lib/postgresql/data/
environment:
POSTGRES_PASSWORD: jumbo
POSTGRES_DB: pmgdb
POSTGRES_USER: coco
healthcheck:
test: pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB
interval: 5s
timeout: 10s
retries: 5
start_period: 10s
volumes:
pgdata:
Start it with:
$ docker compose up
You can access Papermerge DMS user interface using any modern web browser (e.g. Firefox, Chrome). Open your web browser and point it to http://localhost:12000.
🥳 Congratulations! 🥳 You have two microservices running: web app and PostgreSQL 16.1 database. The web app will wait until db service is up and running - and only then it will start.
The format PAPERMERGE__DATABASE__URL
is documented in database settings.
However, in this setup has a problem 🙁. To understand it - upload couple of documents. Now if you remove docker containers with:
$ docker compose down
And then bring them back with:
$ docker compose up
You will notice that all documents "sort of disappeared". You can see document's titles, but when opening them there are errors.
Why so?
The problem is that the storage where uploaded documents are stored is not persistent i.e. it uploaded pdf files lives inside docker container and when docker container is removed - so are our documents! Only documents "titles" are still there, because that part is stored in database, which at this time is persistent.
Note
Uploaded files are NOT stored in database! They are stored in file system directory called media root
Persistent Media Storage
Uploaded files are not stored in database. They are stored in file system directory called media root. In order to persist uploaded files, you need to:
- Mount persistent volume
- Point Papermerge DMS to upload files to the persisted directory
In our example we will create docker compose volume media_root
and mount it to internal (to container) directory /var/media/pmg
. Finally, we use PAPERMERGE__MAIN__MEDIA_ROOT
environment variable to tell
Papermerge DMS where to upload documents.
Here is docker compose file:
services:
webapp:
image: papermerge/papermerge:3.3b21
environment:
PAPERMERGE__SECURITY__SECRET_KEY: 12345
PAPERMERGE__AUTH__USERNAME: admin
PAPERMERGE__AUTH__PASSWORD: admin
PAPERMERGE__DATABASE__URL: postgresql://coco:jumbo@db:5432/pmgdb
PAPERMERGE__MAIN__MEDIA_ROOT: /var/media/pmg
volumes:
- media_root:/var/media/pmg
ports:
- "12000:80"
depends_on:
- db
db:
image: postgres:16.1
volumes:
- pgdata:/var/lib/postgresql/data/
environment:
POSTGRES_PASSWORD: jumbo
POSTGRES_DB: pmgdb
POSTGRES_USER: coco
healthcheck:
test: pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB
interval: 5s
timeout: 10s
retries: 5
start_period: 10s
volumes:
pgdata:
media_root:
Now you can:
$ docker compose up
and
$ docker compose down
How many times you want! Your documents will still be there for you 🥳! We call that - persistent 😎.
Here are important parts of the compose file highlighted:
Following illustration visualizes the concept of persistent media storage:
Path Templates Worker
This one is optional, but you definitely want it in your team. To understand why, you need to understand why is Path Templates feature all about.
services:
webapp:
image: papermerge/papermerge:3.3b21
environment:
PAPERMERGE__SECURITY__SECRET_KEY: 12345
PAPERMERGE__AUTH__USERNAME: admin
PAPERMERGE__AUTH__PASSWORD: admin
PAPERMERGE__DATABASE__URL: postgresql://coco:jumbo@db:5432/pmgdb
PAPERMERGE__MAIN__MEDIA_ROOT: /var/media/pmg
PAPERMERGE__REDIS__URL: redis://redis:6379/0
volumes:
- media_root:/var/media/pmg
ports:
- "12000:80"
depends_on:
- db
- redis
path_template_worker:
image: papermerge/path-tmpl-worker:0.3
command: worker
environment:
PAPERMERGE__DATABASE__URL: postgresql://coco:jumbo@db:5432/pmgdb
PAPERMERGE__REDIS__URL: redis://redis:6379/0
PATH_TMPL_WORKER_ARGS: "-Q path_tmpl -c 2"
depends_on:
- redis
db:
image: postgres:16.1
volumes:
- pgdata:/var/lib/postgresql/data/
environment:
POSTGRES_PASSWORD: jumbo
POSTGRES_DB: pmgdb
POSTGRES_USER: coco
healthcheck:
test: pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB
interval: 5s
timeout: 10s
retries: 5
start_period: 10s
redis:
image: bitnami/redis:7.2
ports:
- "6379:6379"
environment:
ALLOW_EMPTY_PASSWORD: "yes"
volumes:
pgdata:
media_root:
S3 Object Storage
Papermerge DMS supports S3 object storage. This means that it can synchronize local media storage with remote S3 in other words all your documents will be copied to remote S3 storage.
Note
If you delete a document, then both copies - the local one and S3 one will be deleted as well - this is what "synchronized" means. Stated other way: in simple scenarios when you have only one web app running - S3 storage is exact copy of you local media storage.
To enable S3 storage backend you need to add another worker - s3worker
.
Here is docker compose file:
services:
webapp:
image: papermerge/papermerge:3.3b21
environment:
PAPERMERGE__SECURITY__SECRET_KEY: 12345
PAPERMERGE__AUTH__USERNAME: admin
PAPERMERGE__AUTH__PASSWORD: admin
PAPERMERGE__DATABASE__URL: postgresql://coco:jumbo@db:5432/pmgdb
PAPERMERGE__MAIN__MEDIA_ROOT: /var/media/pmg
PAPERMERGE__REDIS__URL: redis://redis:6379/0
volumes:
- media_root:/var/media/pmg
ports:
- "12000:80"
depends_on:
- db
- redis
path_template_worker:
image: papermerge/path-tmpl-worker:0.3.1
command: worker
environment:
PAPERMERGE__DATABASE__URL: postgresql://coco:jumbo@db:5432/pmgdb
PAPERMERGE__REDIS__URL: redis://redis:6379/0
PATH_TMPL_WORKER_ARGS: "-Q path_tmpl -c 2"
depends_on:
- redis
s3worker:
image: papermerge/s3worker:0.4
command: worker
environment:
PAPERMERGE__DATABASE__URL: postgresql://coco:jumbo@db:5432/pmgdb
PAPERMERGE__REDIS__URL: redis://redis:6379/0
PAPERMERGE__MAIN__MEDIA_ROOT: /var/media/pmg
PAPERMERGE__S3__BUCKET_NAME: name-of-your-s3-backet
S3_WORKER_ARGS: "-Q s3 -c 2"
AWS_REGION_NAME: eu-central-1
AWS_ACCESS_KEY_ID: your-aws-access-key-id-here
AWS_SECRET_ACCESS_KEY: your-aws-secret-access-key-here
depends_on:
- db
- redis
volumes:
- media_root:/var/media/pmg
db:
image: postgres:16.1
volumes:
- pgdata:/var/lib/postgresql/data/
environment:
POSTGRES_PASSWORD: jumbo
POSTGRES_DB: pmgdb
POSTGRES_USER: coco
healthcheck:
test: pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB
interval: 5s
timeout: 10s
retries: 5
start_period: 10s
redis:
image: bitnami/redis:7.2
ports:
- "6379:6379"
environment:
ALLOW_EMPTY_PASSWORD: "yes"
volumes:
pgdata:
media_root:
Note that s3worker
needs SAME database url, redis url and media root as
web app. Also notice that media root (same one as for webapp) volume is
mounted. For S3_WORKER_ARGS
the -Q s3
argument is the name of the queue -
don't change this value. And -c 2
is number of concurrent tasks your worker
can perform i.e. it can perform two task in one shot.
Picture below illustrates relationship between services:
http traffic, incoming on http://localhost:12000
, is handled by webapp
.
webapp
communicates with s3worker
via redis. They communicate
via dedicated message queue named s3
.
Both webapp
and s3worker
have access to same:
- database -
PAPERMERGE__DATABASE__URL
- media storage -
PAPERMERGE__MAIN__MEDIA_ROOT
- message broker -
PAPERMERGE__REDIS__URL
For detailed configuration of S3worker see S3 Worker Settings section.
OCR Worker
Following docker compose file is for simple webapp + one OCR worker scenario. Both webapp and OCR worker have access to the same local media storage.
services:
webapp:
image: papermerge/papermerge:3.3b21
environment:
PAPERMERGE__SECURITY__SECRET_KEY: 12345
PAPERMERGE__AUTH__USERNAME: admin
PAPERMERGE__AUTH__PASSWORD: admin
PAPERMERGE__DATABASE__URL: postgresql://coco:jumbo@db:5432/pmgdb
PAPERMERGE__MAIN__MEDIA_ROOT: /var/media/pmg
PAPERMERGE__REDIS__URL: redis://redis:6379/0
PAPERMERGE__OCR__LANG_CODES: "eng,deu"
PAPERMERGE__OCR__DEFAULT_LANG_CODE: "deu"
volumes:
- media_root:/var/media/pmg
ports:
- "12000:80"
depends_on:
- db
- redis
ocr_worker:
image: papermerge/ocrworker:0.3
command: worker
environment:
PAPERMERGE__DATABASE__URL: postgresql://coco:jumbo@db:5432/pmgdb
PAPERMERGE__REDIS__URL: redis://redis:6379/0
PAPERMERGE__MAIN__MEDIA_ROOT: /var/media/pmg
OCR_WORKER_ARGS: "-Q ocr -c 2"
depends_on:
- redis
- db
volumes:
- media_root:/var/media/pmg
db:
image: postgres:16.1
volumes:
- pgdata:/var/lib/postgresql/data/
environment:
POSTGRES_PASSWORD: jumbo
POSTGRES_DB: pmgdb
POSTGRES_USER: coco
healthcheck:
test: pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB
interval: 5s
timeout: 10s
retries: 5
start_period: 10s
redis:
image: bitnami/redis:7.2
ports:
- "6379:6379"
environment:
ALLOW_EMPTY_PASSWORD: "yes"
volumes:
pgdata:
media_root:
You need to instruct webapp what OCR languages are available with PAPERMERGE__OCR__LANG_CODES and what language to use by default: PAPERMERGE__OCR__DEFAULT_LANG_CODE
Picture bellow illustrates docker compose setup.
Note that OCRWorker and webapp have access to the same database, redis and media storage.
OCR Worker + S3
OCR Workers may get document's to be processed as well store their processing results to S3 Object Storage. In other words, S3 object storage may be used as storage medium between webapp and OCRWorkers. This is useful in advanced scenarios where OCR workers run on separate machines.
All you need to instruct OCRWorker to get documents (and store results into) from S3 - just pass following options:
PAPERMERGE__S3__BUCKET_NAME
AWS_REGION_NAME
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
services:
webapp:
image: papermerge/papermerge:3.3b21
environment:
PAPERMERGE__SECURITY__SECRET_KEY: 12345
PAPERMERGE__AUTH__USERNAME: admin
PAPERMERGE__AUTH__PASSWORD: admin
PAPERMERGE__DATABASE__URL: postgresql://coco:jumbo@db:5432/pmgdb
PAPERMERGE__MAIN__MEDIA_ROOT: /var/media/pmg
PAPERMERGE__REDIS__URL: redis://redis:6379/0
PAPERMERGE__OCR__LANG_CODES: "eng,deu"
PAPERMERGE__OCR__DEFAULT_LANG_CODE: "deu"
volumes:
- media_root:/var/media/pmg
ports:
- "12000:80"
depends_on:
- db
- redis
ocr_worker:
image: papermerge/ocrworker:0.3
command: worker
environment:
PAPERMERGE__DATABASE__URL: postgresql://coco:jumbo@db:5432/pmgdb
PAPERMERGE__REDIS__URL: redis://redis:6379/0
PAPERMERGE__S3__BUCKET_NAME: name-of-your-s3-backet
AWS_REGION_NAME: eu-central-1
AWS_ACCESS_KEY_ID: your-aws-access-key-id-here
AWS_SECRET_ACCESS_KEY: your-aws-secret-access-key-here
OCR_WORKER_ARGS: "-Q ocr -c 2"
depends_on:
- redis
- db
s3worker:
image: papermerge/s3worker:0.4
command: worker
environment:
PAPERMERGE__DATABASE__URL: postgresql://coco:jumbo@db:5432/pmgdb
PAPERMERGE__REDIS__URL: redis://redis:6379/0
PAPERMERGE__MAIN__MEDIA_ROOT: /var/media/pmg
PAPERMERGE__S3__BUCKET_NAME: name-of-your-s3-backet
S3_WORKER_ARGS: "-Q s3 -c 2"
AWS_REGION_NAME: eu-central-1
AWS_ACCESS_KEY_ID: your-aws-access-key-id-here
AWS_SECRET_ACCESS_KEY: your-aws-secret-access-key-here
depends_on:
- db
- redis
volumes:
- media_root:/var/media/pmg
db:
image: postgres:16.1
volumes:
- pgdata:/var/lib/postgresql/data/
environment:
POSTGRES_PASSWORD: jumbo
POSTGRES_DB: pmgdb
POSTGRES_USER: coco
healthcheck:
test: pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB
interval: 5s
timeout: 10s
retries: 5
start_period: 10s
redis:
image: bitnami/redis:7.2
ports:
- "6379:6379"
environment:
ALLOW_EMPTY_PASSWORD: "yes"
volumes:
pgdata:
media_root:
Couple of remarks:
- OCRWorker does not have media storage mounted (as it gets its documents from S3, and also stores results in S3)
- OCRWorker relies on S3Worker to sync webapps media storage with S3
- OCRWorker is free to run on any machine as long as it has access to same redis and database as webapp
- There may be any number of OCRWorkers
- S3Worker and webapp need to run on same machine though, because S3Worker needs to sync S3 storage with webapps local media storage