Recognition and identification of form document layouts

We introduce a hierarchical tree representation to represent the logical structure of a form document. Different forms might have the same logical structure, so the representation will be ambiguous. We propose an improvement to solve the ambiguity problem by using the physical information of the blocks. A pixel tracing approach is used to extract form layout structures from form documents. Compared with Hough transform, it requires less computation. This algorithm has been tested on 50 different table forms. The algorithm applies to table form documents.

[1]  Pinar Duygulu Sahin,et al.  A hierarchical representation of form documents for identification and retrieval , 2002, International Journal on Document Analysis and Recognition.

[2]  Adnan Amin,et al.  A Document Skew Detection Method Using the Hough Transform , 2000, Pattern Analysis & Applications.

[3]  Adnan Amin,et al.  Comparative study of skew detection algorithms , 1996, J. Electronic Imaging.

[4]  Hsi-Jian Lee,et al.  An Efficient Algorithm For Form Structure Extraction Using Strip Projection , 1998, Pattern Recognit..

[5]  Anil K. Jain,et al.  A form dropout system , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[6]  Pinar Duygulu Sahin,et al.  A heuristic algorithm for hierarchical representation of form documents , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[7]  Paul Douglas,et al.  Proceedings International Conference on Information Technology: Coding and Computing , 2002, Proceedings. International Conference on Information Technology: Coding and Computing.