An Approach to Skew Detection of Printed Documents

In this paper, we propose an approach to estimate the text skew for printed documents. This is an important step to prevent errors in further stages of an automatic document processing system (as text segmentation). Our approach is based on the statistical analysis of the height of the connected components. In a nutshell, our algorithm is comprised of four steps: (i) removal of redundant data; (ii) establishment of the connected components, which represent filled convex hulls around each text element; (iii) enlargement of these components using morphological erosion; (iv) removal of the largest connected component to identify the first estimation of text skew. According to it, the connected components are enlarged by oriented morphological erosion and the longest of them is extracted. Statistical moments are applied to this longest component to evaluate its orientation and the global text skew of the document is identified. At the end of this process, the original document is rotated back based on the calculated angle. The performance of the proposed algorithm is examined by testing on a custom dataset. The results support the robustness of our approach.

[1]  Adnan Amin,et al.  Robust skew detection in mixed text/graphics documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[2]  Amjad Rehman,et al.  DOCUMENT SKEW ESTIMATION AND CORRECTION: ANALYSIS OF TECHNIQUES, COMMON PROBLEMS AND POSSIBLE SOLUTIONS , 2011, Appl. Artif. Intell..

[3]  Dana H. Ballard,et al.  Generalizing the Hough transform to detect arbitrary shapes , 1981, Pattern Recognit..

[4]  S. Palmer Vision Science : Photons to Phenomenology , 1999 .

[5]  George Kapogiannopoulos A Fast High Precision Algorithm for the Estimation of Skew Angle Using Moments , 2004 .

[6]  Laurent Najman Using mathematical morphology for document skew estimation , 2003, IS&T/SPIE Electronic Imaging.

[7]  Bidyut Baran Chaudhuri,et al.  An improved document skew angle estimation technique , 1996, Pattern Recognit. Lett..

[8]  Nikos Papamarkos,et al.  Local Skew Correction in Documents , 2008, Int. J. Pattern Recognit. Artif. Intell..

[9]  Anil K. Jain,et al.  A robust and fast skew detection algorithm for generic documents , 1996, Pattern Recognit..

[10]  Savvas A. Chatzichristofis,et al.  Text localization using standard deviation analysis of structure elements and support vector machines , 2011, EURASIP J. Adv. Signal Process..

[11]  Azriel Rosenfeld,et al.  A method of detecting the orientation of aligned components , 1986, Pattern Recognit. Lett..

[12]  Hong Yan,et al.  Skew Correction of Document Images Using Interline Cross-Correlation , 1993, CVGIP Graph. Model. Image Process..

[13]  J. Flusser,et al.  Moments and Moment Invariants in Pattern Recognition , 2009 .

[14]  K. Ramachandran,et al.  Mathematical Statistics with Applications. , 1992 .

[15]  Darko Brodić,et al.  The Evaluation of the Initial Skew Rate for Printed Text , 2011 .

[16]  Liangrui Peng,et al.  Statistics Oriented Preprocessing of Document Image , 2015, Comput. Informatics.

[17]  Darko Brodic,et al.  An Algorithm for the Estimation of the Initial Text Skew , 2012, Inf. Technol. Control..

[18]  Darko Brodić,et al.  Log-polar Transformation as a Tool for Text Skew Estimation , 2013 .

[19]  Palaiahnakote Shivakumara,et al.  A novel technique for estimation of skew in binary text document images based on linear regression analysis , 2005 .

[20]  Daniel X. Le,et al.  Document skew-angle detection algorithm , 1993, Defense, Security, and Sensing.

[21]  Yue Lu,et al.  Improved nearest neighbor based approach to accurate document skew estimation , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[22]  George D. C. Cavalcanti,et al.  Fast and robust skew estimation of scanned documents through background area information , 2010, Pattern Recognit. Lett..

[23]  Chris P. Tsokos,et al.  Mathematical Statistics with Applications , 2009 .

[24]  Carlos A. B. Mello,et al.  A new thresholding algorithm for document images based on the perception of objects by distance , 2014, Integr. Comput. Aided Eng..

[25]  Chien-Hsing Chou,et al.  Estimation of skew angles for scanned documents based on piecewise covering by parallelograms , 2007, Pattern Recognit..

[27]  R. Manmatha,et al.  Scale Space Technique for Word Segmentation in Handwritten Manuscripts , 1999 .

[28]  A. Papandreou,et al.  Image Skew Estimation Contest ( DISEC ’ 13 ) , 2013 .

[29]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Nikos A. Nikolaou,et al.  An adaptive technique for global and local skew correction in color documents , 2010, Expert Syst. Appl..

[31]  Henry S. Baird,et al.  The skew angle of printed documents , 1995 .