A Statistical Method for an Automatic Detection of Form Types

In this paper, we present a method to classify forms by a statistical approach; the physical structure may vary from one writer to another. An automatic form segmentation is performed to extract the physical structure which is described by the main rectangular block set. During the form learning phase, a block matching is made inside each class; the number of occurrences of each block is counted, and statistical block attributes are computed. During the phase of identification, we solve the block instability by introducing a block penalty coefficient, which modifies the classical expression of Mahalanobis distance. A block penalty coefficient depends on the block occurrence probability. Experimental results, using the different form types, are given.

[1]  Francesca Cesarini,et al.  A system for data extraction from forms of known class , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[2]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[3]  Rung Ching Chen,et al.  The recognition of form documents based on three types of line segments , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[4]  Jianchang Mao,et al.  A model-based form processing sub-system , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[5]  Yasuto Ishitani,et al.  Flexible and Robust Model Matching based on Association Graph for Form Image Understanding , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[6]  Yuan Yan Tang,et al.  Four directional adjacency graphs (FDAG) and their application in locating fields in forms , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[7]  Hanno Walischewski,et al.  Automatic knowledge acquisition for spatial document interpretation , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[8]  U. Bohnacker,et al.  Matching form lines based on a heuristic search , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[9]  Andreas Dengel,et al.  Formclas - a System for OCR Free identification of Forms , 1996, DAS.