A uniform framework of representation and structure reconstruction for generic form image

In the paper, we propose a novel uniform representation of generic form, within which three typical categories are defined according to the logical layout from the view point of human vision, which are explicit, semiexplicit, and implicit style form. Under the uniform framework, the different criteria are also defined and carried out on the difference kinds of forms, based on the frame line extraction using single-connected chains. Meanwhile, two subclasses of semiexplicit form, mendable and unmendable ones, are discovered and can be reconstructed by key points matching and bounding box dilation so that we can deal with a number of variations of form structure. The experimental results have shown that our proposed framework is effective and reasonable for layout analysis and structure reconstruction.

[1]  Hsi-Jian Lee,et al.  An Efficient Algorithm For Form Structure Extraction Using Strip Projection , 1998, Pattern Recognit..

[2]  Andrew K. C. Wong,et al.  A new method for gray-level picture thresholding using the entropy of the histogram , 1985, Comput. Vis. Graph. Image Process..

[3]  Anil K. Jain,et al.  A Generic System for Form Dropout , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Robert M. Haralick,et al.  Document layout structure extraction using bounding boxes of different entitles , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[5]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[6]  Pan Shi-yan A Form Frame-Line Detection Algorithm Based on Directional Single-Connected Chain , 2002 .

[7]  Josef Kittler,et al.  A survey of the hough transform , 1988, Comput. Vis. Graph. Image Process..

[8]  Anil K. Jain,et al.  Document Representation and Its Application to Page Decomposition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..