A major problem in form reading applications is that form fields cannot be located exactly because of nonlinear distortions on the form images. Such nonlinear distortions appear for example on photocopied forms or on forms transmitted by fax. One way to solve this problem is to determine the form fields by considering the positions of the form lines. This paper describes a new method to find pairs of corresponding form lines on a reference form and a filled form. The advantage of this method is that the corresponding line pairs can be used to map any pixel of the filled form and the reference form without any assumption about the kind of distortion. The core of this method is an algorithm that is based on the A*-search algorithm. Two sets of horizontal or vertical lines, one from the reference form and one from the filled form, are searched for pairs of corresponding lines. The algorithm's run time is low and nonlinear distortions of the form images hardly influence its results. With increasing complexity-i.e. increasing number of lines or decreasing image quality-the number of rejected form lines grows, but the error rate stays low.
[1]
J. M. Gloger,et al.
Use of the Hough transform to separate merged text/graphics in forms
,
1992,
Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.
[2]
Andreas Dengel,et al.
Formclas - a System for OCR Free identification of Forms
,
1996,
DAS.
[3]
Yasuto Ishitani,et al.
Flexible and Robust Model Matching based on Association Graph for Form Image Understanding
,
1995,
Proceedings of 3rd International Conference on Document Analysis and Recognition.
[4]
Anil K. Jain,et al.
A form dropout system
,
1996,
Proceedings of 13th International Conference on Pattern Recognition.
[5]
Henry S. Baird,et al.
The skew angle of printed documents
,
1995
.