Glossary
Here is a list of terms, expressions and abbreviations used in this documentation. Many of terms here are specific for document management problem domain.
OCR
Abbreviation from optical character recognition. OCR is the process of extracting plain text (and associated information) from an image, photo or a picture. Example: John takes a photo with his mobile phone of a paper based bank statement. Let's say IBAN number appears on that document. From resulted photo - filename bank-statement.jpeg - John won't be able to copy IBAN number and paste it over whatsapp to his wife.
On the other hand, if the same bank statement photo is processed using optical character recognition technology (OCR) - the text is extracted from the photo (for example as bank-statement.txt file) and John can open bank-statement.txt file, select IBAN number and copy/paste it in whatsapp chat to his wife.
OCR technology has widespread usage across many areas. It enables computers to understand pictures. If computers understand what text is inside images, then users can search for specific terms across photos.
Scanned document is a just photo of the document - usually of higher quality than photos taken with mobile phones for example. Described with informal terms scanners are specialized devices for taking photos of the documents.
OCR Used as Verb
OCRs - jargon term - a verb derived from noun OCR. Expression File X was OCRed means that optical character recognition process was performed on file X. Similarly expression It OCRs the documents reads "it uses optical character recognition technology over the documents" with same meaning as "it extracts text from scanned documents"
Funny enough, here is how you can conjugate verb OCR in present tense:
Singular | Plural |
---|---|
I OCR | We OCR |
You OCR | You OCR |
He/she/it OCRs | They OCR |
And in past tense (preterite)
Singular | Plural |
---|---|
I OCRed | We OCRed |
You OCRed | You OCRed |
He/she/it OCRed | They OCRed |
Incoming Documents
Documents which are in user's Inbox
folder are called Incoming Documents.
Metacolumn
Metacolumns are the columns displayed for metadata defined on current folder.
DMS
DMS = Document Management System