CSM-based feature extraction for degraded machine printed character recognition

This paper presents an OCR method for degraded character recognition applied to typewritten document produced by typesetting machine. The complementary similarity measure method (CSM) is a well known classification method and widely applied in the area of character recognition. In this work the CSM method is not only used as a classifier but also introduced as a feature extractor, and applied to degraded character recognition. The resulted CSM feature vector is used to train a multi layered perceptron (MLP). The use of the CSM as a feature extractor tends to boost the MLP and makes it very powerful and very well suited for rejection. Experimental results on n typewritten A4 page documents show the ability of the model to yield relevant and robust recognition on poor quality printed document characters.

[1]  Yingbai Yan,et al.  Postprocessing algorithm for the optical recognition of degraded characters , 1999, Electronic Imaging.

[2]  Berrin A. Yanikoglu,et al.  Pitch-based segmentation and recognition of dot-matrix text , 2000, International Journal on Document Analysis and Recognition.

[3]  Tin Kam Ho,et al.  Enhancing degraded document images via bitmap clustering and averaging , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[4]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[5]  Giovanni Soda,et al.  A serial combination of connectionist-based classifiers for OCR , 2001, International Journal on Document Analysis and Recognition.

[6]  Sebastiano Impedovo,et al.  Automatic Bankcheck Processing: A New Engineered System , 1997, Int. J. Pattern Recognit. Artif. Intell..

[7]  Emmanuel Augustin,et al.  Industrial bank check processing: the A2iA CheckReaderTM , 2001, International Journal on Document Analysis and Recognition.

[8]  Zheru Chi,et al.  Handwritten numeral recognition using self-organizing maps and fuzzy rules , 1995, Pattern Recognit..

[9]  Norihiro Hagita,et al.  Text-Line Extraction and Character Recognition of Document Headlines With Graphical Designs Using Complementary Similarity Measure , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Madjid Arezki,et al.  Sequential neural network combination for degraded machine-printed character recognition , 2005, IS&T/SPIE Electronic Imaging.

[11]  Abderrezak Guessoum,et al.  CSM-autossociators combination for degraded machine printed character recognition , 2007, 2007 9th International Symposium on Signal Processing and Its Applications.

[12]  Hadar I. Avi-Itzhak,et al.  High Accuracy Optical Character Recognition Using Neural Networks with Centroid Dithering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..