Papermerge is open source document management system (DMS) designed to work with scanned documents. Papermerge uses OCR (optical character recognition) technology to extract meaningful data from ingested documents. Data extracted with OCR is then used to index documents and prepares them for full text search.
Papermerge is web based software and it provides look and feel of modern desktop application. It has features like dual panel document browser, organizing documents in folders, drag and drop, tagging documents and folders so that you can efficiently store, organize and automate your digital archives workflows.
Works well with PDFs
Desktop like user interface
Dual panel mode
OCR - used to extract text for documents indexing
OCRed text overlay (you can download document with OCRed text overlay)
Full text search (supports multiple search engines)
Tags - assign colored tags to documents or folders
Folders - users can organize documents in folders
User permissions management
Fully REST API (all features can be consumed via REST API)
Page Management - delete, reorder, rotate, merge, move, extract
What It Does
It extracts (using OCR) and indexes text from your documents
Provides modern, desktop like user interface to easily find your documents
- Helps you instantly find your documents:
based on extracted text
based on Tags and Folders
- Helps you fix scanned documents issues like:
delete blank, semi-blank or just irrelevant pages
move strayed pages between documents
change page order within the document
What It Doesn’t Do
It does not take control of your documents. Documents are stored on filesystem in a simple and intuitive manner so that you can take snapshot of your data at any time
It does not stay in your way when you make decisions about your data
It does not overwrite your original documents
Right Tool for You?
To be efficient you always need to choose right tool for the problem. Because Document Management is too generic - I think that a definition of what is a Document in context of this software is needed.
What is a Document?
For Papermerge a document is anything which is a good candidate for archiving - some piece of information which is not editable but you need to store it for future reference. For example receipts are perfect examples - you don’t need to read receipts everyday, but eventually you will need them for your tax declaration. In this sense - scanned documents, which are usually in PDF or TIFF format, are perfect match.
PDF (Portable Document Format) is de facto standard for storing archived documents. In correct technical terms - it is PDF/A subset. PDF/A differs from PDF by prohibiting features unsuitable for long-term archiving, such as font linking and encryption.
Most of the modern office scanners will output scanned files in PDF/A format. This is why, PDF is practically synonymous for document in context of Papermerge.
A picture made with your phone of a A4 paper document is ragarded by papermege as full fledged document, even though digitally it is stored as jpeg or png format. You can think of a picture made with a phone (of a document) as a bad quality scan.
What is Not a Document?
Out of scope are Office documents (ODT, DOCX, spreadsheets, presentations etc), text files (notes); these files are usually editable i.e. user can alter the content of the document. Any format of alterable type of document is out of scope for Papermerge project.
Papermerge is simply not designed to store books. Yes, you can scan a book and import it in Papermege, but again - this is not what Papermerge was designed for.
- User’s Manual
- REST API
- Command Line Utilities
- Contributor’s Manual