Continuous speech recognition

The authors focus on a tutorial description of the hybrid HMM/ANN method. The approach has been applied to large vocabulary continuous speech recognition, and variants are in use by many researchers, The method provides a mechanism for incorporating a range of sources of evidence without strong assumptions about their joint statistics, and may have applicability to much more complex systems that can incorporate deep acoustic and linguistic context. The method is inherently discriminant and conservative of parameters. Despite these potential advantages, the hybrid method has focused on implementing fairly simple systems, which do surprisingly well on large continuous speech recognition tasks, Researchers are only beginning to explore the use of more complex structures with this paradigm. In particular, they are just beginning to look at the connectionist inference of language models (including phonology) from data, which may be required in order to take advantage of locally discriminant probabilities rather than simply translating to likelihoods. Finally, the authors' current intuition is that more advanced versions of the hybrid method can greatly benefit from a perceptual perspective. >

[1]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[4]  R.W. Schafer,et al.  Digital representations of speech signals , 1975, Proceedings of the IEEE.

[5]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[6]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[7]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[8]  Shozo Makino,et al.  Recognition of consonant based on the perceptron model , 1983, ICASSP.

[9]  Hermann Ney,et al.  The use of a one-stage dynamic programming algorithm for connected word recognition , 1984 .

[10]  Frederick Jelinek,et al.  Self-organizing language modeling for speech recognition , 1990 .

[11]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[12]  A. Poritz,et al.  On hidden Markov models in isolated word recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[14]  John Makhoul,et al.  BYBLOS: The BBN continuous speech recognition system , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[16]  Teuvo Kohonen,et al.  The 'neural' phonetic typewriter , 1988, Computer.

[17]  S. M. Peeling,et al.  Isolated digit recognition experiments using the multi-layer perceptron , 1988, Speech Commun..

[18]  Alex Waibel,et al.  Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[19]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[20]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[21]  Richard Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[22]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[23]  Hervé Bourlard,et al.  A Continuous Speech Recognition System Embedding MLP into HMM , 1989, NIPS.

[24]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[25]  Lotfi A. Zadeh,et al.  Phonological structures for speech recognition , 1989 .

[26]  Alexander H. Waibel,et al.  Connectionist Architectures for Multi-Speaker Phoneme Recognition , 1989, NIPS.

[27]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[28]  Mitch Weintraub,et al.  The decipher speech recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[29]  H. Gish,et al.  A probabilistic approach to the understanding and training of neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[30]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[31]  John S. Bridle,et al.  Alpha-nets: A recurrent 'neural' network architecture with a hidden Markov model interpretation , 1990, Speech Commun..

[32]  Alex Waibel,et al.  Large vocabulary recognition using linked predictive neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[33]  Esther Levin,et al.  Word recognition using hidden control neural architecture , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[34]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  A. Waibel,et al.  Connectionist Viterbi training: a new hybrid method for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[36]  H.B.D. Sorensen,et al.  A cepstral noise reduction multi-layer neural network , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[37]  Hynek Hermansky,et al.  Continuous speech recognition using PLP analysis with multilayer perceptrons , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[38]  Alex Waibel,et al.  Connectionist speaker normalization and its applications to speech recognition , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[39]  Frank Fallside,et al.  A recurrent error propagation network speech recognition system , 1991 .

[40]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[41]  Jeff A. Bilmes,et al.  The Ring Array Processor: A Multiprocessing Peripheral for Connection Applications , 1992, J. Parallel Distributed Comput..

[42]  Horacio Franco,et al.  Context-Dependent Multiple Distribution Phonetic Modeling with MLPs , 1992, NIPS.

[43]  Andreas Stolcke,et al.  Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[44]  Vassilios Digalakis,et al.  Segment-based stochastic models of spectral dynamics for continuous speech recognition , 1992 .

[45]  Hervé Bourlard,et al.  CDNN: a context dependent neural network for continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[46]  Yochai Konig,et al.  Modeling Consistency in a Speaker Independent Continuous Speech Recognition System , 1992, NIPS.

[47]  Li Deng,et al.  A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal , 1992, Signal Process..

[48]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[49]  Elliot Singer,et al.  A speech recognizer using radial basis function neural networks in an HMM framework , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[50]  Yoshua Bengio,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.

[51]  Oded Ghitza,et al.  Hidden Markov models with templates as non-stationary states: an application to speech recognition , 1993, Comput. Speech Lang..

[52]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[53]  Richard Lippmann,et al.  Hybrid neural-network/HMM approaches to wordspotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[54]  J. R. Rohlicek,et al.  ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition , 1993, IEEE Trans. Speech Audio Process..

[55]  Michael I. Jordan,et al.  Learning piecewise control strategies in a modular neural network architecture , 1993, IEEE Trans. Syst. Man Cybern..

[56]  Yochai Konig,et al.  A neural network based, speaker independent, large vocabulary, continuous speech recognition system: the WERNICKE project , 1993, EUROSPEECH.

[57]  Steven Greenberg,et al.  Stochastic perceptual auditory-event-based models for speech recognition , 1994, ICSLP.

[58]  Horacio Franco,et al.  Context-dependent connectionist probability estimation in a hybrid hidden Markov model-neural net speech recognition system , 1994, Comput. Speech Lang..

[59]  Andreas Stolcke,et al.  The berkeley restaurant project , 1994, ICSLP.

[60]  Hervé Bourlard,et al.  Connectionist probability estimators in HMM speech recognition , 1994, IEEE Trans. Speech Audio Process..

[61]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[62]  Hervé Bourlard,et al.  Parallel training of MLP probability estimators for speech recognition: a gender-based approach , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[63]  Jayant M. Naik,et al.  Connected digit recognition using connectionist probability estimators and mixture-Gaussian densities , 1994, ICSLP.

[64]  George Zavaliagkos,et al.  A hybrid segmental neural net/hidden Markov model system for continuous speech recognition , 1994, IEEE Trans. Speech Audio Process..

[65]  Yochai Konig,et al.  REMAP: Recursive Estimation and Maximization of A Posteriori Probabilities - Application to Transition-Based Connectionist Speech Recognition , 1995, NIPS.