Efficient skew detection of printed document images based on novel combination of enhanced profiles

Document skew is often introduced during the capturing process of the document image processing pipeline and may seriously affect the performance of subsequent stages of segmentation and recognition. Skew detection is often accomplished with the use of horizontal projections, while recently, a new approach that is based on vertical projections has been introduced. In this paper, we use the technique of minimum bounding box area in order to combine a horizontal with a new reinforced vertical projection profile method. We are motivated by the fact that the horizontal and the novel vertical projection profiles are found to be complementary to each other. We claim that the proposed approach has more accurate performance compared with other state-of-the-art skew detection algorithms; it deals with all the drawbacks of the projection profile methods; it is more noise and warp resistant and gives accurate results for any kind of printed document image. For these reasons, it can be efficiently applied to historical machine printed or multicolumn documents, documents with figures and tables, while it is robust for any kind of script. Extended experimental results on two databases in different skew angle range, with representative printed documents of all kinds, as well as printed documents of two historical books, prove the efficiency of the proposed approach. There is also a comparison with commercial products in several cases where the contribution of the proposed algorithm is demonstrated at optical character recognition level. Moreover, an analysis of the accuracy performance of the main elements of the proposed technique is also performed.

[1]  Jun Sun,et al.  Skew detection using wavelet decomposition and projection profile analysis , 2007, Pattern Recognit. Lett..

[2]  Amandeep Kaur,et al.  Hough transform based fast skew detection and accurate skew correction methods , 2008, Pattern Recognit..

[3]  Venu Govindaraju,et al.  Analysis of textual images using the Hough transform , 1989, Machine Vision and Applications.

[4]  S.C. Hinds,et al.  A document skew detection method using run-length encoding and the Hough transform , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[5]  M. Sarfraz,et al.  Skew Estimation and Correction of Text Using Bounding Box , 2008, 2008 Fifth International Conference on Computer Graphics, Imaging and Visualisation.

[6]  N. Movahhedinia,et al.  On Skew Estimation of Persian/Arabic Printed Documents , 2008 .

[7]  Prasenjit Dey,et al.  e-PCP: A robust skew detection method for scanned document images , 2010, Pattern Recognit..

[8]  Stefano Messelodi,et al.  Geometric Layout Analysis Techniques for Document Image Understanding: a Review , 2008 .

[9]  Chuan Yi Tang,et al.  A 2.|E|-Bit Distributed Algorithm for the Directed Euler Trail Problem , 1993, Inf. Process. Lett..

[10]  G. Ciardiello,et al.  An experimental system for office document handling and text recognition , 1988 .

[11]  Siu Cheung Hui,et al.  Cursive word reference line detection , 1997, Pattern Recognit..

[12]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Mohamed Cheriet,et al.  A New Approach for Skew Correction of Documents Based on Particle Swarm Optimization , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[14]  Shahram Khadivi,et al.  Document skew detection using minimum-area bounding rectangle , 2000, Proceedings International Conference on Information Technology: Coding and Computing (Cat. No.PR00540).

[15]  A. Papandreou,et al.  A Novel Skew Detection Technique Based on Vertical Projections , 2011, 2011 International Conference on Document Analysis and Recognition.

[16]  Anil K. Jain,et al.  A robust and fast skew detection algorithm for generic documents , 1996, Pattern Recognit..

[17]  Chien-Hsing Chou,et al.  Estimation of skew angles for scanned documents based on piecewise covering by parallelograms , 2007, Pattern Recognit..

[18]  Henry S. Baird,et al.  The skew angle of printed documents , 1995 .

[19]  Yue Lu,et al.  A nearest-neighbor chain based approach to skew estimation in document images , 2003, Pattern Recognit. Lett..

[20]  Anil K. Jain,et al.  Text information extraction in images and video: a survey , 2004, Pattern Recognit..

[21]  Alireza Alaei,et al.  A Painting Based Technique for Skew Estimation of Scanned Documents , 2011, 2011 International Conference on Document Analysis and Recognition.

[22]  Azriel Rosenfeld,et al.  A method of detecting the orientation of aligned components , 1986, Pattern Recognit. Lett..

[23]  B. GATOS,et al.  Skew detection and text line position determination in digitized documents , 1997, Pattern Recognit..

[24]  Hong Yan,et al.  Skew Correction of Document Images Using Interline Cross-Correlation , 1993, CVGIP Graph. Model. Image Process..

[25]  Yasuto Ishitani,et al.  Document skew detection based on local region complexity , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[26]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..