A Script-Independent Methodology For Optical Character Recognition

We present a methodology for OCR that exhibits the following properties: script-independent feature extraction, training, and recognition components; no separate segmentation at the character and word levels; and the training is performed automatically on data that is also not presegmented. The methodology is adapted to OCR from continuous speech recognition, which has developed a mature and successful technology based on Hidden Markov Models. The script independence of the methodology is demonstrated using omnifont experiments on the DARPA Arabic OCR Corpus and the University of Washington English Document Image Database I.

[1]  Oscar E. Agazzi,et al.  Hidden markov model based optical character recognition in the presence of deterministic transformations , 1993, Pattern Recognit..

[2]  Seong-Whan Lee,et al.  Off-line recognition of large-set handwritten characters with multiple hidden Markov models , 1996, Pattern Recognition.

[3]  András Kornai,et al.  An experimental HMM-based postal OCR system , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Richard M. Schwartz,et al.  Improved hidden Markov modeling of phonemes for continuous speech recognition , 1984, ICASSP.

[6]  Abdel Belaïd,et al.  Hidden Markov Models in Text Recognition , 1995, Int. J. Pattern Recognit. Artif. Intell..

[7]  Christopher Raphael,et al.  Language-independent OCR using a continuous speech recognition system , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[8]  Sun-Yuan Kung,et al.  Hidden Markov models for character recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[9]  Abdel Belaïd,et al.  Printed PAW recognition based on planar hidden Markov models , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[10]  May Allam Segmentation versus segmentation-free for recognizing Arabic text , 1995, Electronic Imaging.

[11]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[12]  Paul D. Gader,et al.  Handwritten Word Recognition Using Segmentation-Free Hidden Markov Modeling and Segmentation-Based Dynamic Programming Techniques , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[14]  Chinmoy B. Bose,et al.  Connected and degraded text recognition using hidden Markov model , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[15]  Kjersti Aas,et al.  Text page recognition using Grey-level features and hidden Markov models , 1996, Pattern Recognit..

[16]  Woo Sung Kim,et al.  Off-line recognition of handwritten Korean and alphanumeric characters using hidden Markov models , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[17]  A. Kundu,et al.  Recognition of handwritten script: a hidden Markov model based approach , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[18]  Fatos T. Yarman-Vural,et al.  A heuristic algorithm for optical character recognition of Arabic script , 1997, Signal Process..

[19]  John Illingworth,et al.  Modelling polyfont printed characters with HMMs and a shift invariant Hamming distance , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[20]  Jin Hyung Kim,et al.  Modeling and recognition of cursive words with hidden Markov models , 1995, Pattern Recognit..

[21]  J Makhoul,et al.  State of the art in continuous speech recognition. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Richard M. Schwartz,et al.  On-line cursive handwriting recognition using speech recognition methods , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Sabri A. Mahmoud,et al.  Survey and bibliography of Arabic optical text recognition , 1995, Signal Process..

[24]  Horst Bunke,et al.  Off-line cursive handwriting recognition using hidden markov models , 1995, Pattern Recognit..

[25]  Jerome R. Bellegarda,et al.  Tied mixture continuous parameter models for large vocabulary isolated speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[26]  Theodosios Pavlidis,et al.  Character Recognition Without Segmentation , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Sargur N. Srihari,et al.  Handwritten word recognition using continuous density variable duration hidden Markov model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Philip A. Chou,et al.  Document Image Decoding Using Markov Source Models , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Roberto Pieraccini,et al.  Dynamic planar warping for optical character recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Torsten Caesar,et al.  Sophisticated topology of hidden Markov models for cursive script recognition , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[31]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.