Imaging Glossary

Application Hosting – See “hosting.”

ASCII – American Standard Code for Information Interchange. Pronounced AS-kee. It’s the most popular coding method used to convert letters, numbers, punctuation and control codes into digital form. ASCII represents characters, numbers, punctuation marks, or signals in seven on-off bits. A capital “C,” for example, is 1000011, while a “3” is 0110011.

Automated Retrieval –Using digital processes to identify and locate a stored image or electronic document after scanning and indexing.

Backfile Conversion – The process of scanning, indexing and storing a large backlog of paper, microfilm, or microfiche documents in preparation of an electronic document management system. Because of the time-consuming and specialized nature of the task, it is generally performed by a scanning service or imaging service – “outsourcing.”

Bar-Code – A barcode is an optical machine-readable representation of data, which shows data about the object to which it attaches. Originally, barcodes represented data by varying the widths and spacing of parallel lines, and may be referred to as linear or one-dimensional (1D). Later they evolved into rectangles, dots, hexagons and other geometric patterns in two dimensions (2D). Barcodes originally were scanned by special optical scanners called barcode readers; later, scanners and interpretive software became available on devices including desktop printers and smartphones.  Bar codes are used in large scale document scanning applications either on the page or as separators, to assign indexes to the file for future retrieval or workflow processing

Batch/Box – By tracking a document scanning job by the batch or box, a specific document can be easily retrieved during the process if needed by the client.

BPO – Business Process Optimization.  The discipline of adjusting a process so as to optimize some specified set of parameters without violating some constraint. The most common goals are minimizing cost, maximizing throughput, and/or efficiency. This is one of the major quantitative tools in industrial decision making.  When optimizing a process, the goal is to maximize one or more of the process specifications, while keeping all others within their constraints.

Business Process Improvement (BPI) – A systematic approach used to help an organization optimize its underlying processes to achieve more efficient results. The methodology was first documented in H. James Harrington’s 1991 book Business Process Improvement. It is the methodology that both Process Redesign and Business Process Reengineering are based upon. BPI has been responsible for reducing cost and cycle time by as much as 90% while improving quality by over 60%.

Business Process Management (BPM) – BPM solutions are frameworks that can be used to develop, deploy, monitor, and optimize multiple types of process automation applications-including processes that involve both systems and people. Consider which processes are candidates for automation, and whether they require some degree of ad hoc processing or manual intervention.

Character Matching – An OCR technique. The software contains “templates” of possible characters. When the document scanner sees a letter, it compares it to its library of pattern templates. If it matches precisely, it translates it to the corresponding text character and sends the ASCII equivalent of the letter to the output file. Character matching is a very accurate OCR method if the input is properly controlled by a scanning expert.  It is frequently used for automatic indexing of electronic documents.

Cloud Computing – Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software, and information are provided to computers and other devices as a metered service over a network (typically the Internet). Computing clouds provide computation, software, data access, and storage resources without requiring cloud users to know the location and other details of the computing infrastructure. Within limits, cloud users can consume any amount of these resources without having to first acquire servers or other computing equipment.

Constrained Handprint Recognition – The ability of OCR to recognize hand-printed characters when the writer is restricted to certain rules, such as writing only in a box or “comb.” Remember how you had to print your name on the SAT form? That was constrained hand-printing.

Curling – Preparing paper to be used in document scanning devices – removing moisture, flattening, “fanning,” etc. – to prevent jams. Curling is also known as climatizing.

Database Matching – The process of comparing an entered data or index field or record against an existing database field or record as a validation step.

Data Compression – Reduces the amount of electronic “space” data takes up after scanning documents. Methods of data compression include replacing blank spaces with a character count, or replacing redundant data with shorter stand-in “codes.” No matter how data is compressed, it must be decompressed before it is used.

Data Entry – The process of a clerk or typist entering data into a database using a keyboard.  In the case of a document management system, the data is entered to create an index for later retrieval of the scanned documents. For business applications, the data is entered from a document into a database or business application.

Digital – The use of binary code to record information, as in a digital document. “Information” can be text in a binary code like ASCII or scanned images. Converting information to digital storage presents many advantages, mainly in ease of processing, access, storage and accuracy in transmission.

Digital Mailroom – According to Wikipedia, Digital mailroom is a term used to describe the automation of incoming mail processes. Using document scanning and document capture technologies, companies can digitize incoming mail and automate the classification and distribution of mail within the organization. Both paper and electronic mail (e-mail) can be managed through the same process allowing companies to standardize their internal mail distribution procedures and adhere to company compliance policies.

Digital Repository – A system that stores documents electronically as opposed to a system the stores paper documents. The Digital Repository enables the user a means to archive/store and retrieve electronic documents using a computer.

Document Capture – The process of prepping, scanning, and indexing paper, microfilm, or microfiche documents. Can be converted to ASCII for transaction processing or stored in an electronic document management system.

Document Classification – Labeling a document as a particular “type” – an invoice, for example, a purchase order, or a contract. This can be done manually, but top outsourced imaging services offer autoclassification – complex sets of business rules that automatically classify documents based on the information they contain. A few vendors create unique document classification manuals for a Backfile conversion job, and certify employees and the process for accuracy before beginning the classification process.

Document Coding – The “tagging” of documents for entry into a database.

Document ID – A unique identifier for a specific document used in scan and workflow processing.

Document Management – The capture, indexing and intelligent retrieval of scanned documents in a digital system.  According to AIIM, document management technology helps organizations better manage the creation, revision, approval, and consumption of electronic documents. It provides key features such as library services, document profiling, searching, check-in, check-out, version control, revision history, and document security.

Document Prep – The process of removing staples, paper clips, dog ears, and folded pages to prepare the document for the most accurate and efficient (speedy) document scanning while minimizing misfeeds and jams.

Document Recognition – The ability to capture all the information on a page (text and images) and recognize not only characters, but page structure (e.g., number of columns) and images and artwork.

Document Retrieval – The ability to search for, select and display an image of a scanned document.

Document Type – The document type indicates what specific type of document is being captured such as a resume, invoice, receipt, and contract.  Samples of each document type and methods for efficiently identifying the document type and applying appropriate rule sets for processing each document are common steps to take in performing document type identification.

Double Key Verify – When the data is entered twice by two different operators and compared by a computer. If both entries are correct, the result is passed on.  If they are not, a third operator is charged with determining the correct character of value to confirm accuracy.  This is similar to when you are asked to enter your password twice.

ECM – Enterprise Content Management.  Accordingto AIIM, ECM is the strategies, methods, and tools used to capture, manage, store, preserve, and deliver content and documents related to organizational processes. ECM tools and strategies allow the management of an organization’s unstructured information, wherever that information exists.

EDI – Electronic Data Interchange. Automatic paperless system that allows vendors to exchange invoices, purchase orders, etc., in a standard format.

Electronic Document – The set of electronic “images” from a paper document after it has been scanned.

Enterprise Scanning – A large-scale document scanning project collecting information and documents from one or more information networks. Consistent management policies and procedures are necessary.

File Format – How the data is organized to represent an electronic document stored on a computer system. Common document formats are TIFF, PDF, JPG, DOC, EXL, TXT, HTML, and XML.

File Forward – This is the opposite of backfile conversion. It is the process of automating all work from the current day forward.  File forward is more transaction-oriented, while Backfile conversion is digitizing the archives.

Full Text Retrieval – The ability to retrieve text files based on occurrences of certain words, digits, sentences or patterns of characters. You likely use a form of this every day – it’s called “Googling”.

Fuzzy Search – The ability to search on something approximately like the search terms you enter.  For instance, misspelled words or parts of words become searchable.  Some approximate matchers also treat transposition—when the positions of two letters in the string are swapped—to be a primitive operation.

Handprint Recognition – The ability of an OCR to recognize hand-printed characters off of a scanned document image.

Hosting – Traditionally, businesses have had to build and maintain infrastructure to run on-premises applications. With the Software-as-a-Service (SaaS) model, businesses can consume applications that are hosted online, enabling them to lower their costs by paying only for what they use, enjoy seamless and painless upgrades in functionality, and integrate easily with their existing data and systems.  Hosting can also be known as “the Cloud”.

ICR – Intelligent Character Recognition.   This is an advanced optical character recognition (OCR) or — rather more specifically — a handwriting recognition system that allows fonts and different styles of handwriting to be learned by a computer during processing to improve accuracy and recognition levels.

Most ICR software has a self-learning system referred to as a neural network, which automatically updates the recognition database for new handwriting patterns. It extends the usefulness of scanning devices for the purpose of document processing, from printed character recognition (a function of OCR) to hand-written matter recognition.

Image Processing – The manipulation of images created by the document scanning process to improve quality, especially for OCR, or compress for reducing storage requirements.

Image Resolution – The fineness or coarseness of an image as a document is digitized, measured as dots per inch (DPI). IPS typically scans at 200 or 300 dpi.

Imaging – Recording “human-readable” images – pictures, images, text, paper documents, etc. – into digital formats.

Imaging System – The capture, indexing, and intelligent retrieval of scanned documents.  Now more commonly known as a document management system (such as IPS’ ImageServ), as to not be confused with medical imaging.

Indexing – A process used to consolidate related documents in an organized fashion.  Indexing stored documents is the great intellectual challenge in document retrieval.  Anyone can scan a piece of paper.  The hard part is devising an indexing scheme that describes every possible parameter (or “taxonomy”) for later searches, comparisons, and processing.

Intelligent Capture – The ability to accurately extract data from documents using rule sets and logic as opposed to just entering what is on a page WYSIWIG (What you see is what you get).

JPG – A commonly used method of “lossy” compression for digital photography (images) and scanned documents. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. JPEG typically achieves 10:1 compression with little perceptible loss in image quality.

Key from Image – Manually entering data from a scanned document image into a form or database while the document image is displayed on the computer screen.

Key Word Search – Key words must be entered as metadata and then the documents can be retrieved by those set of words. This is different from Full Text Search, when any and every word on the page becomes a searchable term.

Legacy – An existing application – i.e. a program you already use – to which you wish to add imaging processes. Usage: “We underwent a backfile scanning project to convert our legacy archive into a searchable digital format.”

LOB Application – A “Line of Business” application for a particular document scanning service – such as a Backfile conversion – that applies specifically to a certain line of business, such as health care or legal.

Logical Document Separation – The active determination of logical document boundaries prior to, during, or after the scanning/imaging process. See “classification.”

Managed Services – Any outsourced service that is performed at the client location using the client’s hardware, software, systems, and/or equipment.

Mark Sensing (a.k.a. OMR) – Mark Sense was a trade name used by IBM for electrographic forms and systems. It has since come to be used as a generic term for any technology allowing marks made using ordinary writing implements to be processed, encompassing both optical mark recognition and electrographic technology, because the user of a mark-sense form cannot generally tell if the marks are sensed electrically or optically. The term mark-sense is not generally used when referring to technology that distinguishes the shape of the mark; the general term optical character recognition is generally used when mark shapes are distinguished. Because the term mark-sense was originally a trade name, the Federal Government generally used the term electrographic.

Metadata – Metadata was traditionally found in the card catalogs of libraries and is currently referred to as data about data. Information has become increasingly digital and metadata is now used to describe digital data using standards specific to a particular discipline. In document management, the data that is used to store and find a specific document by describing the contents and context of documents is considered metadata. For example, in the medical community a document may include information on a patient or treatment protocol.  The metadata for this document would specify the patient name or prescriptions along with other data elements such as document dates and types.

MICR – Magnetic Ink Character Recognition.  The ability, by a scanning machine, to recognize characters printed with magnetic ink. MICR is used on checks to help banks sort them.

Numeric Machine Print Recognition – OCR that has been restricted to recognizing only numeric characters, greatly improving its accuracy on numbers-only text.  How?  By limiting the alternatives.  OCR works by guessing; if something has the characteristics of a “B,” it also has many of the characteristics of an “8.” By removing the possibility of “B” from its guessing, OCR can hone in more quickly on the right answer.

OCR – Optical Character Recognition. The ability of software to recognize and translate printed alphanumeric data from scanned images of documents into machine-readable (ASCII or formatted) text. OCR is used to extract usable information from scanned documents for transaction processing or auto-indexing within a document management system.  OCR is a key element in Intelligent Capture Systems.

Off-Site –Processing performed at the vendor’s location or processing center.

On-demand Scanning – Scanning documents as needed, rather than scanning an entire document archive in advance.

On-Site – Processing performed at the client’s location or facility.

Page Recognition – OCR software that can tell the difference between text on a page and other items, such as pictures and artwork.

Portable Document Format (PDF) – An open standard for document exchange. This file format (created by Adobe Systems in 1993) is used for representing documents in a manner independent of application software, hardware, and operating systems. Each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, graphics, and other information needed to display it.

Record – (noun) In a database, a record is a group of related data items treated as one unit of information – for example, a policyholder’s name, address, social security number, etc.  Each item in the record is called a field.  OCR is used to extract information from a scanned electronic document and put usable data into a “record”.

Records Management – Content of long-term business value are deemed records and managed according to a retention schedule that determines how long a record is kept based on either outside regulations or internal business practices. Any piece of content can be designated a record.

Retrieval Key – A word, number, or phrase associated with a document to aid in its retrieval from an electronic database. Sometimes called descriptors. There are often many retrieval keys used together to fully locate a document; together, they are called an index.

SaaS (typically pronounced [sæs]) –  Software as a Service.  Sometimes referred to as “on-demand software”.  A software delivery model in which software and its associated data are hosted centrally (typically in the Internet cloud) and are typically accessed by users using a thin client, normally using a web browser over the Internet.

SaaS has become a common delivery model for many business applications, including accounting, collaboration, customer relationship management (CRM), enterprise resource planning (ERP), invoicing, human resource management (HRM), content management (CM), and service desk management.[1] SaaS has been incorporated into the strategy of all leading enterprise software companies.

Scan – To convert physical microfilm, microfiche or paper documents into digital images also known as electronic documents.

Scan QC – Scan Quality Control.  This can mean anything from glancing at the image during scanning to a complete review that produces 99.95% or better scan quality. Top performing document scanning services perform a 100 percent review of each image against the original to insure that every page is scanned in full and is readable.

Scan Resolution – Similar to printing, scanning can be performed with various degrees of resolution. Resolution is measured in dots per inch (DPI) or pixels. Up to a point, the greater the resolution, the better the noticeable image quality. While the difference between 100 and 200 is visually apparent, the difference between 200 and 300 is hard to notice on most business documents. The higher resolution also creates a larger file size and requires more disc space on the system.

Search/Retrieval – One of the greatest benefits of a strong ECM system is the ability to get out what you put in. By having strong indexing, taxonomy, and repository services, locating the information in your system should be a snap.

Skew – When a document is crooked when it’s scanned. Skew causes OCR errors and reduces legibility in document viewing. IPS checks all images against the physical document for skew errors as well as dog ears and overlapping pages.

Text/Image Retrieval – The ability to locate an electronic document (image) by using a full-text search, made possible by doing OCR on all document images.

Text Search – A technique for examining text files for occurrences of specific sets of characters, either in a string (a word or sentence) or in proximity (a certain word in the vicinity of another word). A “contextual search” involves finding electronic documents based on a string of characters that appear in them.

TIFF (Tagged Image File Format) – A file format for images and documents that is used extensively in the document management and other industries. TIFF is a predecessor to PDF.

Workflow – Thetools that electronically move content and documents throughout an identified business process, such as claims, underwriting, credit, loan and invoice processing. Workflow is now commonly associated with automation of the manual processes of managing documents. Workflow handles approvals and prioritizes the order documents are presented. In the case of exceptions, workflow also escalates decisions to the next person in the hierarchy. These decisions are based on pre-defined rules developed by system owners. Some organizations include paper-based processes in the definition of Workflow, but it is intended as a means to describe a digital process.