Filled-in Document Identification Using Local Features and a Direct Voting Scheme

In this work, an approach combining local representations with a direct voting scheme on a k-nearest neighbors classifier to identify filled-in document images is presented. A document class is represented by a high number of local feature vectors selected from its reference image using a given criterion. In the test phase, a number of vectors are equally selected from an image and used to classify it. The experimental results show that the parameterization is not critical, and good performances in terms of error-rate and processing time can be obtained, even though the test documents contain a large proportion of filled-in regions, obviously not present in the reference images.

[1]  Cordelia Schmid,et al.  Bayesian Decision Versus Voting for Image Retrieval , 1997, CAIP.

[2]  Naohiro Furukawa,et al.  Form reading based on form-type identification and form-data recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  Éric Trupin,et al.  Classification method study for automatic form class identification , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[4]  David S. Doermann,et al.  The Indexing and Retrieval of Document Images: A Survey , 1998, Comput. Vis. Image Underst..

[5]  Juan Carlos Pérez-Cortes,et al.  Identification of Very Similar Filled-in Forms with a Reject Option , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[6]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[7]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Maylor K. H. Leung,et al.  Business form classification using strings , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[9]  Naohiro Furukawa,et al.  Form type identification for banking applications and its implementation issues , 2003, IS&T/SPIE Electronic Imaging.

[10]  Kuo-Chin Fan,et al.  Form document identification using line structure based features , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[11]  Amit Kumar Das,et al.  A hierarchical method for automated identification and segmentation of forms , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[12]  Hiroshi Sako,et al.  A Coupon Classification Method Based on Adaptive Image Vector Matching , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[13]  Juan Carlos Pérez-Cortes,et al.  Local Representations and a direct Voting Scheme for Face Recognition , 2001, PRIS.

[14]  Takahiko Horiuchi,et al.  Faxed form identification using histogram of the Hough-space , 2004, ICPR 2004.