Form reading based on form-type identification and form-data recognition

Form reading technology based on form-typeidentification and form-data recognition is proposed. Thistechnology can solve difficulties in variety for readingdifferent items on fairly large number of different types offorms. The form-type identification consists of two parts:(i) extraction of targets such as important keywords in aform by matching between recogised characters and wordstrings in a keyword dictionary, and (ii) analysis ofpositional or semantic relationship between the targets byconstellation matching between these targets and wordlocation information in the keyword dictionary. The formdatarecognition consists of two parts: (i) extraction of aregion of interest (ROI) contained a character string of theitem by using a layout knowledge of the very form-type,and (ii) character string recognition of the item by usingthe linguistic constraint which can be obtained from acontent knowledge of the form-type. A experiment using642 sample forms with 107 different types in totalconfirmed that the form-type identification method cancorrectly identify 97% of 642 form samples at a rejectionrate 3%. Another experiment confirmed that the form-data recognition method can correctly read 95% of thenumber of items on the form samples.

[1]  Francesca Cesarini,et al.  INFORMys: A Flexible Invoice-Like Form-Reader System , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Fumitaka Kimura,et al.  Improvements of a lexicon directed algorithm for recognition of unconstrained handwritten words , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[3]  Yasuaki Nakano,et al.  Segmentation methods for character recognition: from segmentation to document structure analysis , 1992, Proc. IEEE.

[4]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[5]  Toyohide Watanabe,et al.  An Approach to Recover Recognition Failure in Understanding Table- Form Documents , 1994 .

[6]  Horst Bunke,et al.  A fast algorithm for finding the nearest neighbor of a word in a dictionary , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[7]  T. Watanabe,et al.  A framework for validating recognized results in understanding table-form document images , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[8]  Yukio Ogawa,et al.  A recognition method for touching Japanese handwritten characters , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[9]  Takafumi Miyatake,et al.  A position recognition algorithm for semiconductor alignment based on structural pattern matching , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10]  Hiroshi Sako,et al.  A recursive analysis for form cell recognition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[11]  Naohiro Furukawa,et al.  Document-Form Identification Using Constellation Matching of Keywords Abstracted by Character Recognition , 2002, Document Analysis Systems.

[12]  Toyohide Watanabe,et al.  Structure recognition of various kinds of table-form documents , 1994, Systems and Computers in Japan.

[13]  Proceedings Seventh International Conference on Document Analysis and Recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[14]  Bertin Klein,et al.  smartFIX: A Requirements-Driven System for Document Analysis and Understanding , 2002, Document Analysis Systems.