Requirements
Papermerge is web based application. As with any web based application it can be accessed and used from any modern web browser, like Mozilla Firefox, Chrome, Edge, or Safari.
Note
To use Papermerge all you need is just a modern web browser. Papermerge can be accessed and used from any operating system (provided a web browser). It can be accessed via web browsers from Desktop computers, tablets or mobile phones.
Like a typical web application it runs on server-side Linux or Unix-like computer. Thus, if you want to deploy and run Papermerge on your own, you need a Linux/Unix compatible operating system.
Note
To deploy Papermerge you need a Linux or Unix-like operating system.
Following installation guide explains how to install and configure Papermerge on Ubuntu or Debian based Linux computer. With minor adjustments you must be able to successfully install and run Papermerge on any flavor of modern Linux (or Unix) computer.
Software Requirements
In order to successfully deploy Papermerge you need following software:
- Python >= 3.7
- Django >= 3.1
- Tesseract
- Imagemagick
- Poppler - PDF operations
Python
Papermerge (server side) is written in Python programming language. The minimum Python version required is 3.7.
Django
Papermerge uses Django Web Framework for its web facing components. The
minimum required version for Django is 3.1. Generally speaking the fact that
Papermerge is written in Django is not important for setup. You won't need to
worry about exact version of Django (or other internal python libraries on
which Papermerge depends) required as this details are conveniently covered by
package management tools like pip
.
Imagemagick
Papermerge uses Imagemagick to convert between images format. You will need to make sure you have image magic installed.
Poppler
More exactly poppler utils are used. For exampple pdfinfo command line utility is used to find out number of page in PDF document.
Tesseract
If you never heard of [Tesseract software] (https://en.wikipedia.org/wiki/Tesseract_(software)) - it is google's open source Optical Character Recognition software. It extracts text from images. It works fantastically well for wide range of human languages.
In addition to the above, there are a number of Python requirements, all of
which are listed in a file called requirements/base.txt
in the project
root directory.
Hardware Requirements
Papermerge can run a single or multiple hosts (computers). OCR operations are performed by a component called worker. There can be one or more workers. For more efficient setups worker(s) should run on separate computer(s). The exact number of papermerge workers depends on your documents volume.
Single Host
On single host, both web component and worker components run on same computer.
The minimum hardware requirements in this case are:
- 1 GHz CPU
- 1 GB RAM
- 25 GB disk space
Note
Please keep in mind that Papermerge uses Tesseract for optical character recognition (OCR) operations. OCR is very CPU intense task. The rule here is simple - more powerful CPU and more RAM - better! A more powerful CPU (+ more RAM) will be able to complete OCR operations faster.
Multiple Hosts
In multiple hosts scenarios, the web component (i.e. the web application) requires less resources:
- 900 MHz CPU
- 512 MB RAM
- 15 GB disk space
Minimum requirements for 1 worker are:
- 1 GHz CPU
- 1 GB RAM
- 25 GB disk space