Fast and Accurate Detection of Document Skew and Orientation

This paper presents a document skew and orientation detection technique. The proposed technique estimates document skew and orientation based on the observation that text images normally hold a large amount of equidistant interline spacings and the number of character ascenders is statistically much larger than that of character descenders. Given a document image with arbitrary skew and orientation, white run histograms are first constructed through scanning documents in horizontal and vertical directions. Document skew is then estimated by using the white runs that exactly span the interline spacing. Lastly, document orientation is determined according to the numbers of character ascenders and descenders, which are detected by using the white runs that cross the interline spacing and lie over character ascenders and descenders. Experiments show that the proposed technique is fast, accurate, and capable of detecting arbitrary document skew and orientation.

[1]  Yue Lu,et al.  Improved nearest neighbor based approach to accurate document skew estimation , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[2]  Adnan Amin,et al.  A Document Skew Detection Method Using the Hough Transform , 2000, Pattern Analysis & Applications.

[3]  Harry Wechsler,et al.  Automated page orientation and skew angle detection for binary document images , 1994, Pattern Recognit..

[4]  Norihiro Hagita,et al.  Automated entry system for printed documents , 1990, Pattern Recognit..

[5]  S.C. Hinds,et al.  A document skew detection method using run-length encoding and the Hough transform , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[6]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[7]  Azriel Rosenfeld,et al.  A method of detecting the orientation of aligned components , 1986, Pattern Recognit. Lett..

[8]  Dan S. Bloomberg,et al.  Measuring document image skew and orientation , 1995, Electronic Imaging.

[9]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Henry S. Baird,et al.  The skew angle of printed documents , 1995 .