OCR is the process which extracts text information from the scanned document and makes them searchable.
By default, ocr process is triggered automatically on document file upload. The OCR process status is indicated by little circle next to document’s title. When OCR process is completed new document version is created and document becomes searchable.
By default OCR is triggered automatically when document is uploaded. However, you can disable automatic OCR triggering, in such case you can start OCR only when you consider necessary.
Documents for which OCR was skipped - are not searchable!
In order to disable automatic OCR, go to User Menu -> Preferences -> OCR -> Trigger -> Manual
Default OCR Language
In order to perform OCR on the document you need to indicate beforehand the language of respective document. Choosing ocr language for each and every document uploaded is tedious - instead, in preferences a default OCR Language is set - and that language is applied for each uploaded document.
In order to set default OCR language, go to User Menu -> Preferences -> OCR -> Language
Papermerge DMS features real time OCR status indicator - this means that you can see document’s OCR status updates as they happen (i.e. in real time). The OCR status is displayed by a small circle next to the document’s title. The status indicates has following meanings:
gray circle - status is unknown (figure 1)
orange still circle - document was scheduled for OCR (figure 2)
orange rotating circle - document’s OCR process is in progress (figure 3)
green check - document’s OCR process completed successfully and document is now searchable (figure 4)
red cross - document’s OCR process failed.
OCRed Text Layer
Once OCR process completed successfully a new document version is created -
version with OCRed text layer. This version is available for download from
Download dropdown in document view.
Document OCRed Text
If you want to see OCRed text of entire document (to be exact - all pages of the last document version) from the Viewer - just make sure that no pages are selected:
Selected Pages OCRed Text
In case document has many pages and you are interested in OCRed text of one (or multiple) very specific pages, then select pages first and then from context menu choose “OCRed Text” item:
In case there are selected pages, OCRed Text menu item will show you OCRed text ONLY of the selected pages.
OCR Languages Support
Papermerge DMS uses Tesseract to extract text from scanned documents. Tesseract supports over 130 languages - thus with Papermerge DMS you can have documents in any of those languages.