Language modeling using stochastic automata with variable length contexts

Abstract It is well known that language models are effective for increasing the accuracy of speech and handwriting recognizers, but large language models are often required to achieve low model perplexity (or entropy) and still have adequate language coverage. We study three efficient methods for variable order stochastic language modeling in the context of the stochastic pattern recognition problem. Two of these methods are previous techniques from recent literature, and one is a new method based on a successful text compression technique. We give results of a comparative analysis, and demonstrate that the best performance is achieved by extending one of the previous techniques using elements from the newly developed method.

[1]  Lalit R. Bahl,et al.  Design of a linguistic statistical decoder for the recognition of continuous speech , 1975, IEEE Trans. Inf. Theory.

[2]  Isabelle Guyon,et al.  Design of a linguistic postprocessor using variable memory length Markov models , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[3]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[4]  Richard M. Schwartz,et al.  On-line cursive handwriting recognition using speech recognition methods , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[6]  R. Nigel Horspool,et al.  Data Compression Using Dynamic Markov Modelling , 1987, Comput. J..

[7]  Yoshua Bengio,et al.  Globally trained handwritten word recognizer using spatial representation, space displacement neural networks and hidden Markov models , 1993 .

[8]  Dana Ron,et al.  The Power of Amnesia , 1993, NIPS.

[9]  Stephen E. Levinson Improving word recognition accuracy by means of syntax , 1977 .

[10]  Roberto Pieraccini,et al.  Non-deterministic stochastic language models for speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Pascale Fung,et al.  The estimation of powerful language models from small and large corpora , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Lalit R. Bahl,et al.  Automatic recognition of continuously spoken sentences from a finite state grammer , 1978, ICASSP.

[15]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[16]  Glen G. Langdon,et al.  Universal modeling and coding , 1981, IEEE Trans. Inf. Theory.

[17]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[19]  Jukka Teuhola,et al.  Application of a Finite-State Model to Text Compression , 1993, Comput. J..

[20]  Hiroshi Maruyama,et al.  Real-time on-line unconstrained handwriting recognition using statistical methods , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[21]  William Turin,et al.  HANDWRITING RECOGNITION WITH HIDDEN MARKOV MODELS AND GRAMMATICAL CONSTRAINTS , 1994 .

[22]  S. E. Levinson,et al.  The effects of syntactic analysis on word recognition accuracy , 1978, The Bell System Technical Journal.

[23]  Isabelle Guyon,et al.  On-line cursive script recognition using time-delay neural networks and hidden Markov models , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  P. Billingsley,et al.  Statistical Methods in Markov Chains , 1961 .

[25]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.