Local Skew Correction in Documents

In this paper we propose a technique for detecting and correcting the skew of text areas in a document. The documents we work with may contain several areas of text with different skew angles. First, a text localization procedure is applied based on connected components analysis. Specifically, the connected components of the document are extracted and filtered according to their size and geometric characteristics. Next, the candidate characters are grouped using a nearest neighbor approach to form words and then based on these words text lines of any skew are constructed. Then, the top-line and baseline for each text line are estimated using linear regression. Text lines in near locations, having similar skew angles, are grown to form text areas. For each text area a local skew angle is estimated and then these text areas are skew corrected independently to horizontal or vertical orientation. The technique has been extensively tested on a variety of document images and its accuracy and robustness is compared with other existing techniques.

[1]  Dan S. Bloomberg,et al.  Measuring document image skew and orientation , 1995, Electronic Imaging.

[2]  Yue Lu,et al.  A nearest-neighbor chain based approach to skew estimation in document images , 2003, Pattern Recognit. Lett..

[3]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[4]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[5]  Chew Lim Tan,et al.  Fiducial line based skew estimation , 2005, Pattern Recognit..

[6]  Robert M. Haralick,et al.  An automatic algorithm for text skew estimation in document images using recursive morphological transforms , 1994, Proceedings of 1st International Conference on Image Processing.

[7]  Palaiahnakote Shivakumara,et al.  A novel boundary growing approach for accurate skew estimation of binary document images , 2006, Pattern Recognit. Lett..

[8]  B. GATOS,et al.  Skew detection and text line position determination in digitized documents , 1997, Pattern Recognit..

[9]  C. Strouthopoulos,et al.  Identification of text-only areas in mixed-type documents , 1997 .

[10]  Shu-Yuan Chen,et al.  Adaptive page segmentation for color technical journals' cover images , 1998, Image Vis. Comput..

[11]  Chien-Hsing Chou,et al.  Estimation of skew angles for scanned documents based on piecewise covering by parallelograms , 2007, Pattern Recognit..

[12]  Yang Cao,et al.  Skew detection and correction in document images bsed on straight-line fitting , 2003, Pattern Recognit. Lett..

[13]  Chew Lim Tan,et al.  Convex hull based skew estimation , 2007, Pattern Recognit..

[14]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Pattern Recognit..

[15]  Deepak Bagai,et al.  A new algorithm for skew detection and correction , 2004, Pattern Recognit. Lett..

[16]  Matti Pietikäinen,et al.  Document skew estimation without angle range restriction , 1999, International Journal on Document Analysis and Recognition.

[17]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[18]  Stefano Messelodi,et al.  Automatic identification and skew estimation of text lines in real scene images , 1999, Pattern Recognition.