Multi-lingual Offline Handwriting Recognition Using Hidden Markov Models: A Script-Independent Approach

This paper introduces a script-independent methodology for multilingual offline handwriting recognition (OHR) based on the use of Hidden Markov Models (HMM). The OHR methodology extends our script-independent approach for OCR of machine-printed text images. The feature extraction, training, and recognition components of the system are all designed to be script independent. The HMM training and recognition components are based on our Byblos continuous speech recognition system. The HMM parameters are estimated automatically from the training data, without the need for laborious hand-written rules. The system does not require pre-segmentation of the data, neither at the word level nor at the character level. Thus, the system can handle languages with cursive handwritten scripts in a straightforward manner. The script independence of the system is demonstrated with experimental results in three scripts that exhibit significant differences in glyph characteristics: English, Chinese, and Arabic. Results from an initial set of experiments are presented to demonstrate the viability of the proposed methodology.

[1]  Oscar E. Agazzi,et al.  Hidden markov model based optical character recognition in the presence of deterministic transformations , 1993, Pattern Recognit..

[2]  Kjersti Aas,et al.  Text page recognition using Grey-level features and hidden Markov models , 1996, Pattern Recognit..

[3]  Horst Bunke,et al.  A full English sentence database for off-line handwriting recognition , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[4]  Steve Austin,et al.  The forward-backward search algorithm , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Yuan Yan Tang,et al.  Offline Recognition of Chinese Handwriting by Multifeature and Multilevel Classification , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Samy Bengio,et al.  Offline recognition of unconstrained handwritten texts using HMMs and statistical language models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[8]  May Allam Segmentation versus segmentation-free for recognizing Arabic text , 1995, Electronic Imaging.

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Yong Haur Tay,et al.  An offline cursive handwritten word recognition system , 2001, Proceedings of IEEE Region 10 International Conference on Electrical and Electronic Technology. TENCON 2001 (Cat. No.01CH37239).

[11]  Johansson. Stig,et al.  Manual of information to accompany the Lancaster-Oslo : Bergen Corpus of British English, for use with digital computers , 1978 .

[12]  Woo Sung Kim,et al.  Off-line recognition of handwritten Korean and alphanumeric characters using hidden Markov models , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[13]  Seong-Whan Lee,et al.  Off-line recognition of large-set handwritten characters with multiple hidden Markov models , 1996, Pattern Recognition.

[14]  András Kornai,et al.  An experimental HMM-based postal OCR system , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Richard M. Schwartz,et al.  Advances in the BBN BYBLOS OCR system , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[17]  Richard M. Schwartz,et al.  On-line cursive handwriting recognition using speech recognition methods , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Sabri A. Mahmoud,et al.  Survey and bibliography of Arabic optical text recognition , 1995, Signal Process..

[19]  Horst Bunke,et al.  Off-line cursive handwriting recognition using hidden markov models , 1995, Pattern Recognit..

[20]  Abdel Belaïd,et al.  Printed PAW recognition based on planar hidden Markov models , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[21]  Richard M. Schwartz,et al.  A Script-Independent Methodology For Optical Character Recognition , 1998, Pattern Recognit..

[22]  Roberto Pieraccini,et al.  Dynamic planar warping for optical character recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Fatos T. Yarman-Vural,et al.  A heuristic algorithm for optical character recognition of Arabic script , 1997, Signal Process..

[24]  Paul D. Gader,et al.  Handwritten Word Recognition Using Segmentation-Free Hidden Markov Modeling and Segmentation-Based Dynamic Programming Techniques , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Cheng-Lin Liu,et al.  Global shape normalization for handwritten Chinese character recognition: a new method , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[26]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[27]  Chinmoy B. Bose,et al.  Connected and degraded text recognition using hidden Markov model , 1994, Pattern Recognit..

[28]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[29]  Abdel Belaïd,et al.  Hidden Markov Models in Text Recognition , 1995, Int. J. Pattern Recognit. Artif. Intell..

[30]  Long Nguyen,et al.  Multiple-Pass Search Strategies , 1996 .

[31]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[32]  Rae-Hong Park,et al.  Off-line recognition of handwritten Korean and alphanumeric characters using hidden Markov models , 1996, Pattern Recognit..

[33]  Torsten Caesar,et al.  Sophisticated topology of hidden Markov models for cursive script recognition , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[34]  M. Pechwitz,et al.  IFN/ENIT: database of handwritten arabic words , 2002 .

[35]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[36]  John Illingworth,et al.  Modelling polyfont printed characters with HMMs and a shift invariant Hamming distance , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[37]  Shaoping Ma,et al.  Feature extraction by hierarchical overlapped elastic meshing for handwritten Chinese character recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[38]  Jerome R. Bellegarda,et al.  Tied mixture continuous parameter models for large vocabulary isolated speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[39]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[40]  Sargur N. Srihari,et al.  Handwritten word recognition using continuous density variable duration hidden Markov model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech signals , 1990 .

[42]  Volker Märgner,et al.  Arabic Handwriting Recognition Competition , 2005, ICDAR.

[43]  Jin Hyung Kim,et al.  Modeling and recognition of cursive words with hidden Markov models , 1995, Pattern Recognit..

[44]  J Makhoul,et al.  State of the art in continuous speech recognition. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[45]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[46]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech recognition , 1989 .

[47]  Seong-Whan Lee,et al.  A truly 2-D hidden Markov model for off-line handwritten character recognition , 1998, Pattern Recognit..

[48]  Paramvir Bahl,et al.  Recognition of handwritten word: First and second order hidden Markov model based approach , 1989, Pattern Recognit..

[49]  Fatos T. Yarman-Vural,et al.  Heuristic algorithm for optical character recognition of Arabic script , 1996, Other Conferences.

[50]  Christopher Raphael,et al.  Script-independent, HMM-based text line finding for OCR , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[51]  Theodosios Pavlidis,et al.  Character Recognition Without Segmentation , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Richard M. Schwartz,et al.  Multilingual Machine Printed OCR , 2001, Int. J. Pattern Recognit. Artif. Intell..

[53]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[54]  A. Kundu,et al.  Recognition of handwritten script: a hidden Markov model based approach , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[55]  Sun-Yuan Kung,et al.  Hidden Markov models for character recognition , 1992, IEEE Trans. Image Process..