Here is a list of terms, expressions and abbreviations used in this documentation. Many of terms here are specific for document management problem domain.
DMS = Document Management System
Digital archive is a fancy name for a document whose content won’t change anymore. Think of PDF documents - you can’t modifgy content of PDF file unless you have specialized software. PDF documents are a good example of digital archives. Any scanned document is automatically a digital archive as well.
Throughout Papermerge documentation terms document, scanned document, pdf document, digital archive are used interchangeble and mean the same thing.
Abbreviation from optical character recognition. OCR is the process of extracting plain text (and associated information) from file, document or image.
Example: John takes a photo with his mobile phone of a paper based bank statement. Let’s say IBAN number appears on that document. From resulted photo - filename bank-statement.jpeg - John won’t be able to copy IBAN number and paste it over whatsapp to his wife.
On the other hand, if the same bank statement photo is processed using optical character recognition technology (OCR) - the text is extracted from the photo (for example as bank-statement.txt file) and John can open bank-statement.txt file, select IBAN number and copy/paste it in whatsapp chat to his wife.
OCR technology has widespread usage across many areas. It enables computers to understand pictures. If computers understand what text is inside images, then users can search for specific terms across photos.
Scanned document is a just photo of the document - usually of higher quality than photos taken with mobile phones for example. Described with informal terms scanners are specialized devices for taking photos of the documents.
A page that during the scan ended up in wrong document is called strayed page.
For example, let’s say you scanned two contracts A and B and each contract is four pages long. Scanned document-A.pdf ended up having five pages and scanned document-B.pdf - three pages:
document-A.pdf: A1, A2, B2, A3, A4
document-B.pdf: B1, B3, B4
Page B2 (page from contract B) obviously does not belong to document A. Page B2 belongs to document B. In such case we say that page B2 is strayed page.
Documents which are in user’s
Inbox folder are called Incoming Documents.
Metacolumns are the columns displayed for metadata defined on current folder.