Correcting document image warping based on regression of curved text lines

Image warping is a common problem when one scans or photocopies a document page from a thick bound volume, resulting in shading and curved text lines in the spine area of the bound volume. This will not only impair readability, but will also reduce the OCR accuracy. Further to our earlier attempt to correct such images, this paper proposes a simpler connected component analysis and regression technique. Compared to our earlier method, the present system is computationally less expensive and is resolution independent too. The implementation of the new system and improvement of OCR accuracy are presented in this paper.

[1]  Robert M. Haralick,et al.  Global and local document degradation models , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[2]  Tapas Kanungo,et al.  Morphological degradation models and their use in document image restoration , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[3]  Proceedings Seventh International Conference on Document Analysis and Recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[4]  Rainer Hoch,et al.  On the evaluation of document analysis components by recall, precision, and accuracy , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[5]  C. Strouthopoulos,et al.  Identification of text-only areas in mixed-type documents , 1997 .

[6]  Henry S. Baird,et al.  Document image defect models , 1995 .

[7]  Robert M. Haralick,et al.  A Statistical, Nonparametric Methodology for Document Degradation Model Validation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[9]  Pierre Baylou,et al.  Active contours network to straighten distorted text lines , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[10]  Henry S. Baird,et al.  Document image quality: making fine discriminations , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[11]  Chew Lim Tan,et al.  Recovery of distorted document images from bound volumes , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[12]  Henry S. Baird,et al.  Document image defect models and their uses , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).