An Efficient Algorithm For Form Structure Extraction Using Strip Projection

This paper presents an efficient strip-projection-based approach to extracting form structures from form documents for office automation. To locate the data, we have to extract and interpret the form structure. In this paper, a strip projection method is presented for extracting the form structure. We first segment input form images into uniform vertical and horizontal strips. Since most form lines are vertical or horizontal, we project the image of each vertical strip horizontally and that of each horizontal strip vertically. The peak positions in these projection profiles denote possible locations of lines in form images. We then extract the lines starting with the possible line positions in the source image. After all lines have been extracted, redundant lines are removed using a line-verification algorithm and broken lines are linked using a line-merging algorithm. Experimental results show that the proposed method can extract form structures from A4-sized documents in about 3 seconds which is very efficient, compared with the methods based on Hough transformation and run-based line-detection algorithms.

[1]  Yuan Yan Tang,et al.  Document Processing for Automatic Knowledge Acquisition , 1994, IEEE Trans. Knowl. Data Eng..

[2]  Kuo-Chin Fan,et al.  Extraction of characters from form documents by feature point clustering , 1995, Pattern Recognit. Lett..

[3]  Toyohide Watanabe,et al.  Layout Recognition of Multi-Kinds of Table-Form Documents , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Yasuaki Nakano,et al.  Segmentation methods for character recognition: from segmentation to document structure analysis , 1992, Proc. IEEE.

[5]  Øivind Due Trier,et al.  Evaluation of Binarization Methods for Document Images , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  S.W. Lam,et al.  Anatomy of a form reader , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[7]  Ling-Hwei Chen,et al.  A high-speed algorithm for line detection , 1996, Pattern Recognit. Lett..