Cursive handwriting recognition using hidden Markov models and a lexicon-driven level building algorithm

The authors describe a method for the recognition of cursively handwritten words using hidden Markov models (HMMs). The modelling methodology used has previously been successfully applied to the recognition of both degraded machine-printed text and hand-printed numerals. A novel lexicon-driven level building (LDLB) algorithm is proposed, which incorporates a lexicon directly within the search procedure and maintains a list of plausible match sequences at each stage of the search, rather than decoding using only the most likely state sequence. A word recognition rate of 93.4% is achieved using a 713 word lexicon, compared to just 49.8% when the same lexicon is used to post-process the results produced by a standard level building algorithm. Various procedures are described for the normalisation of cursive script. Results are presented on a single-author database of scanned text. It is shown how very high reliability, up to near perfect recognition, can be achieved by using a threshold to reject those word hypotheses to which the system assigns a low confidence. At 19% rejection, 99.2% of accepted words appeared in the top two choices produced by the system, and 100% of the 1645 accepted words were correctly recognised within the top eight choices.

[1]  Stephen E. Levinson,et al.  A speaker-independent, syntax-directed, connected word recognition system based on hidden Markov models and level building , 1985, IEEE Trans. Acoust. Speech Signal Process..

[2]  Oscar E. Agazzi,et al.  Hidden markov model based optical character recognition in the presence of deterministic transformations , 1993, Pattern Recognit..

[3]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[4]  Sun-Yuan Kung,et al.  Hidden Markov models for character recognition , 1992, IEEE Trans. Image Process..

[5]  Chinmoy B. Bose,et al.  Connected and degraded text recognition using hidden Markov model , 1994, Pattern Recognit..

[6]  Anthony J. Robinson,et al.  An Off-Line Cursive Handwriting Recognition System , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Jian Zhou,et al.  Off-Line Handwritten Word Recognition Using a Hidden Markov Model Type Stochastic Network , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[9]  A. J. Elms The representation and recognition of text using hidden Markov models , 1996 .

[10]  John Illingworth,et al.  The advantage of using an HMM-based approach for faxed word recognition , 1998, International Journal on Document Analysis and Recognition.

[11]  Sargur N. Srihari,et al.  Off-Line Cursive Script Word Recognition , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  David Pairman,et al.  Optimality of reassignment rules in dynamic clustering , 1988, Pattern Recognit..

[13]  A. W. Senior Normalisation and Preprocessing for a Recurrent Network Off-line Handwriting Recognition System , 1994 .

[14]  Sargur N. Srihari,et al.  Integrating diverse knowledge sources in text recognition , 1982, TOIS.

[15]  Henry S. Baird,et al.  Document image defect models , 1995 .

[16]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.