Continuous speech recognition by connectionist statistical methods

Over the period of 1987-1991, a series of theoretical and experimental results have suggested that multilayer perceptrons (MLP) are an effective family of algorithms for the smooth estimation of high-dimension probability density functions that are useful in continuous speech recognition. The early form of this work has focused on hidden Markov models (HMM) that are independent of phonetic context. More recently, the theory has been extended to context-dependent models. The authors review the basic principles of their hybrid HMM/MLP approach and describe a series of improvements that are analogous to the system modifications instituted for the leading conventional HMM systems over the last few years. Some of these methods directly trade off computational complexity for reduced requirements of memory and memory bandwidth. Results are presented on the widely used Resource Management speech database that has been distributed by the US National Institute of Standards and Technology.

[1]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[2]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[3]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[4]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[5]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[6]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Hermann Ney,et al.  The use of a one-stage dynamic programming algorithm for connected word recognition , 1984 .

[8]  Nelson Morgan,et al.  "Ignorance-based" systems , 1984, ICASSP.

[9]  John Makhoul,et al.  Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[11]  Robert W. Brodersen,et al.  An integrated-circuit-based speech recognition system , 1986, IEEE Trans. Acoust. Speech Signal Process..

[12]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[13]  Lokendra Shastri,et al.  Learning Phonetic Features Using Connectionist Networks , 1987, IJCAI.

[14]  Michael Picheny,et al.  On a model-robust training method for speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[15]  Hermann Ney,et al.  Phoneme modelling using continuous mixture densities , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[16]  Raymond L. Watrous,et al.  Complete gradient optimization of a recurrent network applied to /b/,/d/,/g/ discrimination , 1988 .

[17]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[18]  S. M. Peeling,et al.  Isolated digit recognition experiments using the multi-layer perceptron , 1988, Speech Commun..

[19]  Hy Murveit,et al.  1000-word speaker-independent continuous-speech recognition using hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[20]  Mitch Weintraub,et al.  SRI's DECIPHER System , 1989, HLT.

[21]  D. B. Paul,et al.  The Lincoln robust continuous speech recognizer , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[22]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .

[23]  M. A. Bush,et al.  How limited training data can allow a neural network to outperform an 'optimal' statistical classifier , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[24]  Hervé Bourlard,et al.  Speech pattern discrimination and multilayer perceptrons , 1989 .

[25]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[26]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[27]  Janet M. Baker,et al.  On the Interaction Between True Source, Training, and Testing Language Models , 1990, HLT.

[28]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[29]  Harvey F. Silverman,et al.  Combining hidden Markov model and neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[30]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[31]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[32]  H. Gish,et al.  A probabilistic approach to the understanding and training of neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[33]  Steve Renals Speech and neural network dynamics , 1990 .

[34]  Hervé Bourlard,et al.  Continuous speech recognition on the resource management database using connectionist probability estimation , 1990, ICSLP.

[35]  Jeff A. Bilmes,et al.  The RAP: a ring array processor for layered network calculations , 1990, [1990] Proceedings of the International Conference on Application Specific Array Processors.

[36]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[37]  Kai-Fu Lee,et al.  Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[38]  John S. Bridle,et al.  Alpha-nets: A recurrent 'neural' network architecture with a hidden Markov model interpretation , 1990, Speech Commun..

[39]  Hervé Bourlard,et al.  Continuous speech recognition using multilayer perceptrons with hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[40]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  A. Waibel,et al.  Connectionist Viterbi training: a new hybrid method for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[42]  Douglas B. Paul,et al.  The Lincoln tied-mixture HMM continuous speech recognizer , 1990, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[43]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[44]  Hynek Hermansky,et al.  Continuous speech recognition using PLP analysis with multilayer perceptrons , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[45]  Richard M. Schwartz,et al.  Continuous speech recognition using segmental neural nets , 1991 .

[46]  Hermann Dr Ney Speech recognition in a neural network framework: discriminative training of Gaussian models and mixture densities as radial basis functions , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[47]  James K. Baker,et al.  On the interaction between true source, training, and testing language models , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[48]  Frank Fallside,et al.  A recurrent error propagation network speech recognition system , 1991 .

[49]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[50]  Horacio Franco,et al.  Context-Dependent Multiple Distribution Phonetic Modeling with MLPs , 1992, NIPS.

[51]  Hervé Bourlard,et al.  CDNN: a context dependent neural network for continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[52]  Yochai Konig,et al.  GDNN: a gender-dependent neural network for continuous speech recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[53]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .