A form dropout system

This paper describes a system for form dropout when the filled-in characters or symbols are either touching or crossing the form frames and the form model is unknown. Since some of the character strokes are either touching or crossing the form frames, we need to address the following three issues: (i) localization of form frames; (ii) separation between characters and form frames, and (ii) reconstruction of broken strokes introduced during separation. The form frame is automatically located by finding long straight lines based on a data structure, called block adjacency graph. Form frame removal and character reconstruction are implemented in this graph. When the same process is applied to a blank form, followed by the procedure of connected component extraction and clustering, a form structure-based template is automatically generated which includes form model, skew angle and preprinted data areas. Given the form template, our system can extract both handwritten and machine-typed filled-in data. Experimental results on three different types of forms demonstrate the performance of our system.

[1]  Yuan Yan Tang,et al.  Financial document processing based on staff line and description language , 1995, IEEE Trans. Syst. Man Cybern..

[2]  Graham Leedham,et al.  Evaluation of an interactive tool for handwritten form description , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[3]  Anil K. Jain,et al.  A robust and fast skew detection algorithm for generic documents , 1996, Pattern Recognit..

[4]  Azriel Rosenfeld,et al.  The processing of form documents , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[5]  Baozong Yuan,et al.  Isothetic polygon representation for contours , 1992, CVGIP Image Underst..

[6]  Gerd Maderlechner 'Symbolic Subtraction' of Fixed Formatted Graphics and Text from Filled In Forms , 1990, MVA.

[7]  Jianchang Mao,et al.  A two-stage multi-network OCR system with a soft pre-classifier and a network selector , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[8]  Anil K. Jain,et al.  A Generic System for Form Dropout , 1996, IEEE Trans. Pattern Anal. Mach. Intell..