Reading handwritten phrases on U.S. census forms

Commercial form-reading systems for extraction of data from forms do not meet acceptable accuracy requirements on forms filled out by hand. Several important form-processing applications involve the automated reading of handwritten responses. US. Census forms are a case in point. A database of form images containing actual responses received by the U.S. Census Bureau was made available by National Institute of Standards and Technology (NIST) in December 1993. A number of factors combine to make the task of reading these forms a challenging one. The quality of form images is often poor, and the handwritten responses are very loosely constrained in terms of writing style, format of response, and choice of text. The sizes of the lexicons provided are large (10,000-50,000 entries) and yet the coverage is incomplete (60%-70%). In this article we discuss our approach to automate the task of reading the census forms. The subtasks of field extraction and phrase recognition are described and multiclassifier control strategies for phrase recognition are presented. The error rate of the system when no rejects are allowed is 59%, with a lower bound of 40% being imposed by the incomplete coverage of the lexicon. The article concludes with a discussion of experimental results and directions for future research.

[1]  Venu Govindaraju,et al.  Serial classifier combination for handwritten word recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[2]  Sargur N. Srihari,et al.  Offline recognition of handwritten cursive words , 1992, Electronic Imaging.

[3]  Patrick J. Grother,et al.  The Second Census Optical Character Recognition Systems Conference , 1994 .

[4]  Venu Govindaraju,et al.  Character image enhancement by selective region-growing , 1996, Pattern Recognit. Lett..

[5]  Gyeonghwan Kim,et al.  Handwritten word recognition for real-time applications , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[6]  Sargur N. Srihari,et al.  Interpretation of handwritten addresses in US mailstream , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).