A Generic Form Processing Approach for Large Variant Templates

In today’s world, form processing systems must be able to recognize mutant forms that appear to be based on differing templates but are actually only a variation of the original. A single definition of a representative template actually covers large varieties of the same logical templates. We developed a method and system, similar to the human visual system, which differentiates between templates via features such as logos, dominant words, and geometrical shapes, while ignoring minor details and variations. When the system finds an appropriate template, it then decodes the content of the form. Our approach has been applied in several scenarios with encouraging results.

[1]  Y. Belaid,et al.  Item searching in forms: Application to French tax form , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[2]  Yaakov Navon Layer-based binarization for textual images , 2008, 2008 19th International Conference on Pattern Recognition.

[3]  Kuo-Chin Fan,et al.  Form document identification using line structure based features , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[4]  Anil K. Jain,et al.  A Generic System for Form Dropout , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Naohiro Furukawa,et al.  Form reading based on form-type identification and form-data recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[6]  Yasuaki Nakano,et al.  Segmentation methods for character recognition: from segmentation to document structure analysis , 1992, Proc. IEEE.