A system for data extraction from forms of known class

In this paper, we describe a flexible and efficient system for processing forms of a known class. The model is based on attributed relational graphs and the system performs form registration and location of information fields using algorithms based on the hypothesize-and-verify paradigm. A special emphasis has been placed at the low level, where an autoassociator-based connectionist model has exhibited successful results in finding the instruction fields in very noisy forms.

[1]  King-Sun Fu,et al.  An Image Understanding System Using Attributed Symbolic Representation and Inexact Graph-Matching , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  William Grimson,et al.  Object recognition by computer - the role of geometric constraints , 1991 .

[3]  S.W. Lam,et al.  Anatomy of a form reader , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[4]  Paolo Frasconi,et al.  Learning in multilayered networks used as autoassociators , 1995, IEEE Trans. Neural Networks.

[5]  Yuan Yan Tang,et al.  Document Processing for Automatic Knowledge Acquisition , 1994, IEEE Trans. Knowl. Data Eng..

[6]  Yasuaki Nakano,et al.  Segmentation methods for character recognition: from segmentation to document structure analysis , 1992, Proc. IEEE.