Skew Detection and Correction Technique for Arabic Document Images Based on Centre of Gravity

Problem statement: Skew detection and correction is the first step process in the document analysis and understanding processing steps. Correction the skewed scanned document image is very important, because it has a direct effect on the reliability and efficiency of the segmentation and feature extraction stages. The noises and the deviation in the document resolution or types are still the main two challenges facing the Arabic skew detection and correction methods. Approach: The proposed method work involved inscribing the text in the document by an arbitrary polygon and derivation of the baseline from polygon’s centroid. Results: The proposed method was implemented on 150 different scanned Arabic documents, from different sources like journals, textbooks, newspapers and the like in addition to handwritten document, with different resolutions and different fonts and it was obtained an accuracy ratio of 87%. Conclusion: The proposed method was efficient, simple and fast, it was not affected by noise and it was proved their suitability to work with documents with different fonts and documents with different resolutions.

[1]  Bidyut Baran Chaudhuri,et al.  An improved document skew angle estimation technique , 1996, Pattern Recognit. Lett..

[2]  Harry Wechsler,et al.  Automated page orientation and skew angle detection for binary document images , 1994, Pattern Recognit..

[3]  Mohammad S. Khorsheed,et al.  Off-Line Arabic Character Recognition – A Review , 2002, Pattern Analysis & Applications.

[4]  Norihiro Hagita,et al.  Automated entry system for printed documents , 1990, Pattern Recognit..

[5]  Venu Govindaraju,et al.  Analysis of textual images using the Hough transform , 1989, Machine Vision and Applications.

[6]  Anil K. Jain,et al.  A robust and fast skew detection algorithm for generic documents , 1996, Pattern Recognit..

[7]  Azriel Rosenfeld,et al.  A method of detecting the orientation of aligned components , 1986, Pattern Recognit. Lett..

[8]  Venu Govindaraju,et al.  Offline Arabic handwriting recognition: a survey , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Hong Yan,et al.  Skew Correction of Document Images Using Interline Cross-Correlation , 1993, CVGIP Graph. Model. Image Process..

[10]  Ahmad M. Sarhan,et al.  Arabic Character Recognition using Artificial Neural Networks and Statistical Analysis , 2007 .

[11]  P. Adibi,et al.  NASTAALIGH HANDWRITTEN WORD RECOGNITION USING A CONTINUOUS-DENSITY VARIABLE-DURATION HMM , 2005 .

[12]  Hsieh S. Hou,et al.  Digital document processing , 1983 .

[13]  Volker Märgner,et al.  Databases and Competitions: Strategies to Improve Arabic Recognition Systems , 2006, SACH.

[14]  Sabri A. Mahmoud,et al.  Survey and bibliography of Arabic optical text recognition , 1995, Signal Process..

[15]  Khairuddin Omar,et al.  Skew Detection and Correction of Jawi Images Using Gradient Direction , 2002 .

[16]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Akram M. Zeki,et al.  The Segmentation Problem in Arabic Character Recognition The State Of The Art , 2005 .

[18]  S.N. Nawaz,et al.  An approach to offline Arabic character recognition using neural networks , 2003, 10th IEEE International Conference on Electronics, Circuits and Systems, 2003. ICECS 2003. Proceedings of the 2003.