Links Between Markov Models and Multilayer Perceptrons

The statistical use of a particular classic form of a connectionist system, the multilayer perceptron (MLP), is described in the context of the recognition of continuous speech. A discriminant hidden Markov model (HMM) is defined, and it is shown how a particular MLP with contextual and extra feedback input units can be considered as a general form of such a Markov model. A link between these discriminant HMMs, trained along the Viterbi algorithm, and any other approach based on least mean square minimization of an error function (LMSE) is established. It is shown theoretically and experimentally that the outputs of the MLP (when trained along the LMSE or the entropy criterion) approximate the probability distribution over output classes conditioned on the input, i.e. the maximum a posteriori probabilities. Results of a series of speech recognition experiments are reported. The possibility of embedding MLP into HMM is described. Relations with other recurrent networks are also explained. >

[1]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[2]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[3]  Frederick Jelinek,et al.  Speech Recognition by Statistical Methods , 1976 .

[4]  S. S. Marcus ERIS-context sensitive coding in speech perception , 1981 .

[5]  Michael D. Brown,et al.  An algorithm for connected word recognition , 1982, ICASSP.

[6]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Hermann Ney,et al.  The use of a one-stage dynamic programming algorithm for connected word recognition , 1984 .

[8]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[9]  T. D. Harrison,et al.  Boltzmann machines for speech recognition , 1986 .

[10]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Robert W. Brodersen,et al.  An integrated-circuit-based speech recognition system , 1986, IEEE Trans. Acoust. Speech Signal Process..

[12]  C. J. Wellekens,et al.  Global connected digit recognition using Baum-Welch algorithm , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  D. Rumelhart Learning Internal Representations by Error Propagation, Parallel Distributed Processing , 1986 .

[14]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[15]  Richard P. Lippmann,et al.  Two-stage discriminant analysis for improved isolated-word recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Peter F. Brown,et al.  The acoustic-modeling problem in automatic speech recognition , 1987 .

[17]  John J. Hopfield,et al.  CONCENTRATION INFORMATION IN TIME: ANALOG NEURAL NETWORKS WITH APPLICATIONS TO SPEECH RECOGNITION PROBLEMS. , 1987 .

[18]  Hermann Ney,et al.  Training of phoneme models in a sentence recognition system , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[20]  David J. Burr Speech Recognition Experiments with Perceptrons , 1987, NIPS.

[21]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[22]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[23]  Lokendra Shastri,et al.  Learning Phonetic Features Using Connectionist Networks , 1987, IJCAI.

[24]  C. J. Wellekens,et al.  Explicit time correlation in hidden Markov models for speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Kevin J. Lang,et al.  Speech recognition using time‐delay neural networks , 1988 .

[26]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[27]  Fernando J. Pineda,et al.  Dynamics and architecture for neural computation , 1988, J. Complex..

[28]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[29]  B. Merialdo Phonetic recognition using hidden Markov models and maximum mutual information training , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[30]  Hy Murveit,et al.  1000-word speaker-independent continuous-speech recognition using hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[31]  Hervé Bourlard,et al.  A Continuous Speech Recognition System Embedding MLP into HMM , 1989, NIPS.

[32]  Hervé Bourlard,et al.  Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.

[33]  Hervé Bourlard,et al.  Speech pattern discrimination and multilayer perceptrons , 1989 .

[34]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[35]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[36]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[37]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .