A Rule Based Approach for Skew Correction and Removal of Insignificant Data from Scanned Text Documents of Devanagari Script

In this paper we have presented a rule based approach for removing insignificant data and skew from scanned documents of Devanagari script. To develop an OCR system for Devanagari script is not an easy job hence proper preprocessing of these scanned documents requires noise removal and correcting skew from the image. The proposed system is based on rule based methods, morphological operations and connected component labeling. Images used for the experiment are binarised grayscale images. Experiments and results show that presented method is robust for preprocessing scanned images of Devanagari text documents.

[1]  Changsong Liu,et al.  Optimized Gabor filter based feature extraction for character recognition , 2002, Object recognition supported by user interaction for service robots.

[2]  Qiang Huo,et al.  High performance Chinese OCR based on Gabor features, discriminative feature extraction and model training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Bidyut Baran Chaudhuri,et al.  A complete printed Bangla OCR system , 1998, Pattern Recognit..

[4]  Bidyut Baran Chaudhuri,et al.  Skew Angle Detection of Digitized Indian Script Documents , 1997, IEEE Trans. Pattern Anal. Mach. Intell..