Design of a linguistic postprocessor using variable memory length Markov models

We describe a linguistic postprocessor for character recognizers. The central module of our system is a trainable variable memory length Markov model (VLMM) that predicts the next character given a variable length window of past characters. The overall system is composed of several finite state automata, including the main VLMM and a proper noun VLMM. The best model reported in the literature (Brown et al., 1992) achieves 1.75 bits per character on the Brown corpus. On that same corpus, our model, trained on 10 times less data, reaches 2.19 bits per character and is 200 times smaller (/spl sime/160,000 parameters). The model was designed for handwriting recognition applications but could also be used for other OCR problems and speech recognition.

[1]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[2]  Fernando Pereira,et al.  Weighted Rational Transductions and their Application to Human Language Processing , 1994, HLT.

[3]  Thomas M. Cover,et al.  A convergent gambling estimate of the entropy of English , 1978, IEEE Trans. Inf. Theory.

[4]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[5]  Robert L. Mercer,et al.  An Estimate of an Upper Bound for the Entropy of English , 1992, CL.

[6]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[7]  Isabelle Guyon,et al.  Structural Risk Minimization for Character Recognition , 1991, NIPS.

[8]  J. Cleary,et al.  \self-organized Language Modeling for Speech Recognition". In , 1997 .

[9]  I. Guyon,et al.  Advances in pattern recognition systems using neural network technologies , 1994 .

[10]  Yoshua Bengio,et al.  Globally trained handwritten word recognizer using spatial representation, space displacement neural networks and hidden Markov models , 1993 .

[11]  Dana Ron,et al.  The Power of Amnesia , 1993, NIPS.

[12]  Isabelle Guyon,et al.  On-line cursive script recognition using time-delay neural networks and hidden Markov models , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[14]  Sadaoki Furui,et al.  Advances in Speech Signal Processing , 1991 .

[15]  Hermann Ney,et al.  Stochastic Grammars and Pattern Recognition , 1992 .