Degraded dot matrix character recognition using CSM-based feature extraction

This paper presents an OCR method for degraded character recognition applied to a reference number (RN) of 15 printed characters of an invoice document produced by dot-matrix printer. First, the paper deals with the problem of the reference number localization and extraction, in which the characters tops or bottoms are or not touched with a printed reference line of the electrical bill. In case of touched RN, the extracted characters are severely degraded leading to missing parts in the characters tops or bottoms. Secondly, a combined recognition method based on the complementary similarity measure (CSM) method and MLP-based classifier is used. The CSM is used to accept or reject an incoming character. In case of acceptation, the CSM acts as a feature extractor and produces a feature vector of ten component features. The MLP is then trained using these feature vectors. The use of the CSM as a feature extractor tends to make the MLP very powerful and very well suited for rejection. Experimental results on electrical bills show the ability of the model to yield relevant and robust recognition on severely degraded printed characters.

[1]  Tin Kam Ho,et al.  Enhancing degraded document images via bitmap clustering and averaging , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[2]  Yingbai Yan,et al.  Postprocessing algorithm for the optical recognition of degraded characters , 1999, Electronic Imaging.

[3]  Abderrezak Guessoum,et al.  CSM-autossociators combination for degraded machine printed character recognition , 2007, 2007 9th International Symposium on Signal Processing and Its Applications.

[4]  Berrin A. Yanikoglu,et al.  Pitch-based segmentation and recognition of dot-matrix text , 2000, International Journal on Document Analysis and Recognition.

[5]  Norihiro Hagita,et al.  Text-Line Extraction and Character Recognition of Document Headlines With Graphical Designs Using Complementary Similarity Measure , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Zheru Chi,et al.  Handwritten numeral recognition using self-organizing maps and fuzzy rules , 1995, Pattern Recognit..

[7]  Giovanni Soda,et al.  A serial combination of connectionist-based classifiers for OCR , 2001, International Journal on Document Analysis and Recognition.

[8]  Hadar I. Avi-Itzhak,et al.  High Accuracy Optical Character Recognition Using Neural Networks with Centroid Dithering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..