Connectionist probability estimators in HMM speech recognition

The authors are concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system. This is achieved through a statistical interpretation of connectionist networks as probability estimators. They review the basis of HMM speech recognition and point out the possible benefits of incorporating connectionist networks. Issues necessary to the construction of a connectionist HMM recognition system are discussed, including choice of connectionist probability estimator. They describe the performance of such a system using a multilayer perceptron probability estimator evaluated on the speaker-independent DARPA Resource Management database. In conclusion, they show that a connectionist component improves a state-of-the-art HMM system. >

[1]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[2]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[3]  F. Jelinek Fast sequential decoding algorithm using a stack , 1969 .

[4]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[5]  Nils J. Nilsson,et al.  Problem-solving methods in artificial intelligence , 1971, McGraw-Hill computer science series.

[6]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  Phillips B. Scott VICI - A speaker independent word recognition system , 1976, ICASSP.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Frederick Jelinek,et al.  Continuous speech recognition , 1977, SGAR.

[10]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[11]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[12]  Shozo Makino,et al.  Recognition of consonant based on the perceptron model , 1983, ICASSP.

[13]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Hermann Ney,et al.  The use of a one-stage dynamic programming algorithm for connected word recognition , 1984 .

[15]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[16]  John Makhoul,et al.  Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[18]  Serge Soudoplatoff,et al.  Markov modeling of continuous parameters in speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[20]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[22]  Peter F. Brown,et al.  The acoustic-modeling problem in automatic speech recognition , 1987 .

[23]  J J Hopfield,et al.  Learning algorithms and probability distributions in feed-forward and feed-back networks. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Anthony J. Robinson,et al.  Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.

[25]  M. J. D. Powell,et al.  Radial basis functions for multivariable interpolation: a review , 1987 .

[26]  Richard P. Lippmann,et al.  A neural net approach to speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[27]  T. Kohonen,et al.  Statistical pattern recognition with neural networks: benchmarking studies , 1988, IEEE 1988 International Conference on Neural Networks.

[28]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[29]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[30]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[31]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[32]  Frank K. Soong,et al.  High performance connected digit recognition, using hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[33]  S. Renals,et al.  Phoneme classification experiments using radial basis functions , 1989, International 1989 Joint Conference on Neural Networks.

[34]  Hervé Bourlard,et al.  A Continuous Speech Recognition System Embedding MLP into HMM , 1989, NIPS.

[35]  Frank K. Soong,et al.  High performance connected digit recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[36]  Hervé Bourlard,et al.  Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.

[37]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[38]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[39]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[40]  Ken-ichi Iso,et al.  Speaker-independent word recognition using a neural prediction model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[41]  Michael Picheny,et al.  Automatic phonetic baseform determination , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[42]  Mitch Weintraub,et al.  The decipher speech recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[43]  H. Gish,et al.  A probabilistic approach to the understanding and training of neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[44]  Hervé Bourlard,et al.  Continuous speech recognition on the resource management database using connectionist probability estimation , 1990, ICSLP.

[45]  Hisashi Wakita,et al.  Neural predictive hidden Markov model , 1990, ICSLP.

[46]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[47]  Hervé Bourlard,et al.  Connectionist Approaches to the Use of Markov Models for Speech Recognition , 1990, NIPS.

[48]  John S. Bridle,et al.  Alpha-nets: A recurrent 'neural' network architecture with a hidden Markov model interpretation , 1990, Speech Commun..

[49]  Alex Waibel,et al.  Large vocabulary recognition using linked predictive neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[50]  Esther Levin,et al.  Word recognition using hidden control neural architecture , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[51]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  David Lowe,et al.  The optimised internal representation of multilayer classifier networks performs nonlinear discriminant analysis , 1990, Neural Networks.

[53]  A. Waibel,et al.  Connectionist Viterbi training: a new hybrid method for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[54]  Hervé Bourlard,et al.  Connectionist Optimisation of Tied Mixture Hidden Markov Models , 1991, NIPS.

[55]  R. Kompe,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[56]  Les T. Niles,et al.  TIMIT phoneme recognition using an HMM-derived recurrent neural network , 1991, EUROSPEECH.

[57]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[58]  Frank Fallside,et al.  A recurrent error propagation network speech recognition system , 1991 .

[59]  Barak A. Pearlmutter,et al.  Equivalence Proofs for Multi-Layer Perceptron Classifiers and the Bayesian Discriminant Function , 1991 .

[60]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[61]  Fergus McInnes,et al.  A comparative study of continuous speech recognition using neural networks and hidden Markov models , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[62]  J. S. Bridle,et al.  An Alphanet approach to optimising input transformations for continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[63]  Ronald A. Cole,et al.  English alphabet recognition with telephone speech , 1991, EUROSPEECH.

[64]  Steve Renals,et al.  Connectionist probability estimation in the DECIPHER speech recognition system , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[65]  Jeff A. Bilmes,et al.  The Ring Array Processor: A Multiprocessing Peripheral for Connection Applications , 1992, J. Parallel Distributed Comput..

[66]  Horacio Franco,et al.  Hybrid neural network/hidden Markov model continuous-speech recognition , 1992, ICSLP.

[67]  Hervé Bourlard,et al.  CDNN: a context dependent neural network for continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[68]  Elliot Singer,et al.  A speech recognizer using radial basis function neural networks in an HMM framework , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[69]  Steve Austin,et al.  Speech recognition using segmental neural nets , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[70]  Hervé Bourlard,et al.  Factoring Networks by a Statistical Method , 1992, Neural Computation.

[71]  Yoshua Bengio,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.

[72]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[73]  Hervé Bourlard,et al.  Continuous speech recognition by connectionist statistical methods , 1993, IEEE Trans. Neural Networks.

[74]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[75]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[76]  Hervé Bourlard,et al.  Continuous speech recognition , 1995, IEEE Signal Process. Mag..