Automatic prototype extraction for adaptive OCR

A Bayesian method of isolating character bitmaps from paragraph-length samples of heavily degraded text images is demonstrated. The method requires a transcript of the text, but it is sufficiently robust to tolerate errors in transcripts obtained from multifont commercial OCR software. The resulting prototypes (labeled character images) are used to recognize additional text an the same document.

[1]  Gary E. Kopec,et al.  Document-specific character template estimation , 1996, Electronic imaging.

[2]  Chinmoy B. Bose,et al.  Connected and degraded text recognition using hidden Markov model , 1994, Pattern Recognit..

[3]  Julian R. Ullmann,et al.  Pattern recognition techniques , 1973 .

[4]  George Nagy,et al.  Priming the recognizer , 1996, DAS.

[5]  Henry S. Baird,et al.  Asymptotic accuracy of two-class discrimination , 1994 .

[6]  Tao Hong,et al.  Character segmentation using visual interword constraints in a text page , 1995, Electronic Imaging.

[7]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8]  Gary E. Kopec Least-squares font metric estimation from images , 1993, IEEE Trans. Image Process..

[9]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Mindy Bokser,et al.  Omnidocument technologies , 1992, Proc. IEEE.

[11]  Philip A. Chou,et al.  Document Image Decoding Using Markov Source Models , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  A. Lawrence Spitz An OCR based on character shape codes and lexical information , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.