Extraction of reference lines and items from form document images with complicated background

Abstract The extraction of reference lines and items is a fundamental and crucial task in form document analysis. Most of the studies performed so far were done in connection with binary images. This paper proposes a method of extracting lines from gray-level images, by constructing a 2D pseudo Gaussian–Coiflet wavelet with adjustable rectangular support. We also present a method of extracting items using the extracted reference lines and multiresolution wavelet sub-images, which is independent of the intensity of the strokes and backgrounds. The experimental results demonstrate the effectiveness of our proposed methods.

[1]  A. C. Downton,et al.  A Colour Classification Approach to Form Dropout , 1999 .

[2]  Anil K. Jain,et al.  A Generic System for Form Dropout , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Hsi-Jian Lee,et al.  Field data extraction for form document processing using a gravitation-based algorithm , 2001, Pattern Recognit..

[4]  Hsi-Jian Lee,et al.  An Efficient Algorithm For Form Structure Extraction Using Strip Projection , 1998, Pattern Recognit..

[5]  Yasuto Ishitani Flexible and Robust Model Matching based on Association Graph for Form Image Understanding , 2000, Pattern Analysis & Applications.

[6]  Hong Zhao,et al.  Global-local-global method for logical structure extraction of form document image , 2000, J. Electronic Imaging.

[7]  Suzanne Liebowitz Taylor,et al.  Extraction of data from preprinted forms , 2007, Machine Vision and Applications.

[8]  Kuo-Chin Fan,et al.  Extraction of characters from form documents by feature point clustering , 1995, Pattern Recognit. Lett..

[9]  Rung Ching Chen,et al.  Recognition And Data Extraction Of Form Documents Based On Three Types Of Line Segments , 1998, Pattern Recognit..

[10]  S. Mallat Multiresolution approximations and wavelet orthonormal bases of L^2(R) , 1989 .

[11]  I. Daubechies Orthonormal bases of compactly supported wavelets , 1988 .

[12]  David R. Ferguson,et al.  Intelligent forms processing system , 2007, Machine Vision and Applications.

[13]  Anil K. Jain,et al.  Image-based form document retrieval , 2000, Pattern Recognit..

[14]  Minoru Okada Extraction of User Entered Components from A Personal Bankcheck Using Morphological Subtraction , 1997, Int. J. Pattern Recognit. Artif. Intell..

[15]  Ching Y. Suen,et al.  A generic method of cleaning and enhancing handwritten data from business forms , 2001, International Journal on Document Analysis and Recognition.

[16]  Chi-Fang Lin,et al.  Structural Recognition for Table-Form Document Using Relaxation Techniques , 1998, Int. J. Pattern Recognit. Artif. Intell..

[17]  Yuan Yan Tang,et al.  Multiresolution analysis in extraction of reference lines from documents with gray level background , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  S. Mallat A wavelet tour of signal processing , 1998 .