Extraction of data from preprinted forms

The widespread use of printed forms for data acquisition makes the ability to automatically read and analyze their contents desirable. The components of a forms analysis system include conversion from paper to an image through scanning, image enhancement, document identification, data extraction, and data interpretation. This paper describes techniques for manipulating electronic images of forms in preparation for data interpretation. A combination feature extraction/model-based approach is used for forms identification, registration, and field extraction. Forms identification is implemented with a neural network. The system is demonstrated on United States Internal Revenue Service forms.