A System for Offline Cursive Handwritten Word Recognition

Cursive handwriting recognition is a difficult problem because of large variations in handwritten words as well as overlaps and interconnections between neighboring characters. In this thesis we introduce a new approach to this problem, called Hidden Markov Model with Multiple Observation Sequences (HMMMOS). A preprocessor extracts each word from a scanned-in document image and divides it into segments. A Neural Network (NN) classifier then finds the likelihoods of each possible character class given the segments and combinations of segments. These likelihoods, along with statistics computed from a lexicon, are used as input to a dynamic programming algorithm which recognizes the entire word. The dynamic programming algorithm can be viewed as a modified Viterbi algorithm for a Hidden Markov Model (HMM). Three types of Neural Networks are tried: a recurrent network, a Multilayer Perceptron (MLP), and a Hierarchical Mixture of Experts (HME) [Jordan & Jacobs 1994]. As an extension, the Viterbi algorithm for the HMM is implemented for a multiprocessor environment to speed up recognition. Thesis Supervisor: Michael Jordan Title: Professor, Department of Brain and Cognitive Science Acknowledgments This thesis would have been impossible without the help of my mentor at IBM Almaden Research Center, Dr. Jianchiang Mao. With his deep knowledge and keen insight about the topic, he guided me and served as an inspiration throughout the project. He taught me a lot about the process of doing research. And most importantly, he encouraged me to keep my focus and work hard every day. I salute my thesis advisor, Prof. Michael Jordan, for his patience, input, and support. Many colleagues at IBM made my task easier. Prasun Sinha gave very valuable input on my thesis drafts, and helped me at various stages in the project. My manager, K. M. Mohiuddin, was kind and always showed interest in my well-being and progress. An office-mate, Mary Mandl was ajoy to have around. Others who gave encouragement and a helping hand are Sandeep Gopisetty, Raymond Lorrie, Andras Kornai, Tom Truong, and Peter DeSouza. Special thanks to Andras and Mao for giving me rides from work when I missed the shuttle. I thank a friend from MIT, Beethoven Cheng, for helping me edit my drafts for grammar and style. I am grateful to Allen Luniewsky, the 6-A liaison between IBM and MIT, and Prof. Jerome Saltzer, the 6-A advisor for IBM, who made sure I was plugged in and ready to work on my thesis. Hurrah to my cheering squad of friends: Tom and Jane Letterman, Luong Tran, Jay Lulla, Yee Chuan Koh, and Chung-hsiu Ma. Finally, I thank my parents, Perfecto and Carmencita Abayan, and my sister Maureen, for always being there for me. I dedicate this work to them.

[1]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[2]  Markus Schenkel,et al.  Off-line cursive handwriting recognition compared with on-line recognition , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[3]  Paul D. Gader,et al.  Handwritten Word Recognition Using Segmentation-Free Hidden Markov Modeling and Segmentation-Based Dynamic Programming Techniques , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[5]  Jian Zhou,et al.  Off-Line Handwritten Word Recognition Using a Hidden Markov Model Type Stochastic Network , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Bernard Widrow,et al.  The basic ideas in neural networks , 1994, CACM.

[7]  Yoshua Bengio,et al.  Globally Trained Handwritten Word Recognizer Using Spatial Representation, Convolutional Neural Networks, and Hidden Markov Models , 1993, NIPS.

[8]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[9]  Sargur N. Srihari,et al.  Variable duration hidden Markov model and morphological segmentation for handwritten word recognition , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[10]  David Chapman,et al.  Learning to See Where and What: Training a Net to Make Saccades and Recognize Handwritten Characters , 1992, NIPS.

[11]  J.-C. Simon,et al.  Off-line cursive word recognition , 1992, Proc. IEEE.

[12]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[13]  Michael I. Jordan,et al.  Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..

[14]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[15]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[16]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[17]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[18]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[19]  Yann LeCun,et al.  Learning processes in an asymmetric threshold network , 1986 .

[20]  RAOUF F. H. FARAG,et al.  Word-Level Recognition of Cursive Script , 1979, IEEE Transactions on Computers.

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22]  K. S. Fu,et al.  Learning with Stochastic Automata and Stochastic Languages , 1976 .

[23]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.