Neural networks for automatic speech recognition: a review

Most present automatic speech recognition systems are based on stochastic models, especially hidden Markov models (HMMs). However, during the past ten years, several projects have been directed toward the use of a new class of models: the connectionist artificial neural networks (ANNs).

[1]  S. Young Competitive training in hidden Markov models , 1990 .

[2]  Steve Renals,et al.  Connectionist probability estimation in the DECIPHER speech recognition system , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[4]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[5]  Mari Ostendorf,et al.  Integration of Diverse Recognition Methodologies Through Reevaluation of N-Best Sentence Hypotheses , 1991, HLT.

[6]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Richard Lippmann,et al.  Neural Net and Traditional Classifiers , 1987, NIPS.

[8]  J J Hopfield,et al.  Learning algorithms and probability distributions in feed-forward and feed-back networks. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Richard Lippmann,et al.  HMM Speech Recognition with Neural Net Discrimination , 1989, NIPS.

[10]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[11]  Magne Hallstein Johnsen,et al.  Non-linear input transformations for discriminative HMMs , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Harvey F. Silverman,et al.  Combining hidden Markov model and neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[13]  Jean-Claude Junqua,et al.  Speech discrimination in adverse conditions using acoustic knowledge and selectively trained neural networks , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  David J. Burr,et al.  Experiments on neural net recognition of spoken and written text , 1988, IEEE Trans. Acoust. Speech Signal Process..

[15]  Jenq-Neng Hwang,et al.  A systolic neural network architecture for hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[16]  Xavier L. Aubert,et al.  Combining TDNN and HMM in a hybrid system for improved continuous-speech recognition , 1994, IEEE Trans. Speech Audio Process..

[17]  John Lazzaro,et al.  Analog VLSI model of binaural hearing , 1991, IEEE Trans. Neural Networks.

[18]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[19]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[20]  Hans G. C. Tråvén,et al.  A neural network approach to statistical pattern classification by 'semiparametric' estimation of probability density functions , 1991, IEEE Trans. Neural Networks.

[21]  S Dehaene,et al.  Neural networks that learn temporal sequences by selection. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Steve Austin,et al.  Speech recognition using segmental neural nets , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Elliot Singer,et al.  A speech recognizer using radial basis function neural networks in an HMM framework , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[25]  Jean-François Mari,et al.  Hidden Markov models and selectively trained neural networks for connected confusable word recognition , 1994, ICSLP.

[26]  Søren Kamaric Riis Hidden neural networks: application to speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[27]  Donald F. Specht,et al.  Probabilistic neural networks and general regression neural networks , 1996 .

[28]  N. Strom,et al.  A tonotopic artificial neural network architecture for phoneme probability estimation , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[29]  F. Guyot,et al.  Toward a continuous model of the cortical column: Application to speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[30]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[31]  Hervé Bourlard,et al.  Connectionist probability estimators in HMM speech recognition , 1994, IEEE Trans. Speech Audio Process..

[32]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[33]  Geoffrey E. Hinton,et al.  Learning representations of back-propagation errors , 1986 .

[34]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Richard Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[36]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[37]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[38]  F. Rossi,et al.  Geometrical initialization, parametrization and control of multilayer perceptrons: application to function approximation , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[39]  Sammy Siu,et al.  Multilayer perceptron structures applied to adaptive equalisers for data communications , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[40]  Hynek Hermansky,et al.  Sub-band based recognition of noisy speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  J. Oglesby,et al.  Radial basis function networks for speaker recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[42]  Gerhard Rigoll,et al.  Maximum mutual information neural networks for hybrid connectionist-HMM speech recognition systems , 1994, IEEE Trans. Speech Audio Process..

[43]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[44]  Jean-François Mari,et al.  Multi-band continuous speech recognition , 1997, EUROSPEECH.

[45]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[46]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[47]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Teuvo Kohonen,et al.  Speech recognition: a hybrid approach , 1998 .

[49]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[50]  Sin-Horng Chen,et al.  An MRNN-based method for continuous Mandarin speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[51]  Mitch Weintraub,et al.  Neural-network based measures of confidence for word recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[52]  Masami Nakamura,et al.  A study of English word category prediction based on neutral networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[53]  Richard F. Lyon,et al.  An analog electronic cochlea , 1988, IEEE Trans. Acoust. Speech Signal Process..

[54]  Andy Hon Wai Chun,et al.  Toward a massively parallel system for word recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[55]  Yoshua Bengio,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.

[56]  Teuvo Kohonen,et al.  Learning vector quantization , 1998 .

[57]  Hervé Bourlard,et al.  Subband-based speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[58]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[59]  Hervé Bourlard,et al.  Estimation of global posteriors and forward-backward training of hybrid HMM/ANN systems , 1997, EUROSPEECH.

[60]  Alan F. Murray,et al.  IEEE International Conference on Neural Networks , 1997 .

[61]  Yves Burnod,et al.  An adaptive neural network - the cerebral cortex , 1991 .

[62]  Chorkin Chan,et al.  Isolated Word Recognition by Neural Network Models with Cross-Correlation Coefficients for Speech Dynamics , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[63]  Yochai Konig,et al.  REMAP-experiments with speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[64]  Renato De Mori,et al.  A hybrid coder for hidden Markov models using a recurrent neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[65]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[66]  H. Gish,et al.  A probabilistic approach to the understanding and training of neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[67]  Patrick Gallinari,et al.  Learning vector quantization, multi layer perceptron and dynamic programming: comparison and cooperation , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[68]  T. D. Harrison,et al.  Boltzmann machines for speech recognition , 1986 .

[69]  Abdelaziz Kriouile,et al.  Automatic word recognition based on second-order hidden Markov models , 1994, IEEE Trans. Speech Audio Process..

[70]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[71]  T. Kohonen Self-Organized Formation of Correct Feature Maps , 1982 .

[72]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[73]  Brian Kan-Wing Mak Combining ANNs to improve phone recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[74]  M. Nakamura,et al.  Improvements to the noise reduction neural network , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[75]  Alex Waibel,et al.  Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[76]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[77]  A.L. Gorin,et al.  An experiment in spoken language acquisition , 1992, IEEE Trans. Speech Audio Process..

[78]  D. F. Specht,et al.  Enhancements to probabilistic neural networks , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[79]  Hervé Bourlard,et al.  Speech pattern discrimination and multilayer perceptrons , 1989 .

[80]  Hideto Tomabechi,et al.  A parallel recurrent cascade-correlation neural network with natural connectionist glue , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[81]  George Zavaliagkos,et al.  A hybrid segmental neural net/hidden Markov model system for continuous speech recognition , 1994, IEEE Trans. Speech Audio Process..

[82]  Mikko Kurimo,et al.  Training mixture density HMMs with SOM and LVQ , 1997, Comput. Speech Lang..

[83]  Adam Krzyzak,et al.  Rates of convergence of the recursive radial basis function networks , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[84]  Elaine Tsiang,et al.  A neural architecture for computing acoustic-phonetic invariants , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[85]  Ken-ichi Iso,et al.  Speaker-independent word recognition using dynamic programming neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[86]  Dirk Van Compernolle,et al.  Multilayer perceptrons as labelers for hidden Markov models , 1994, IEEE Trans. Speech Audio Process..

[87]  Alex Waibel,et al.  Noise reduction using connectionist models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[88]  Lasse Holmström,et al.  The self-organizing reduced kernel density estimator , 1993, IEEE International Conference on Neural Networks.

[89]  H. Sawai,et al.  Spotting Japanese CV-syllables and phonemes using the time-delay neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[90]  Jack Mostow,et al.  Direct Transfer of Learned Information Among Neural Networks , 1991, AAAI.

[91]  H. Bourlard,et al.  Link between Markov Models and Multi-layer Perceptoron , 1990 .

[92]  S. Renals,et al.  Phoneme classification experiments using radial basis functions , 1989, International 1989 Joint Conference on Neural Networks.

[93]  Roger K. Moore,et al.  Experiments in Isolated Digit Recognition Using the Multi-Layer Perceptron, , 1987 .

[94]  Mohamad T. Musavi,et al.  On the training of radial basis function classifiers , 1992, Neural Networks.

[95]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[96]  Yoshinori Sagisaka,et al.  Automatic generation of a pronunciation dictionary based on a pronunciation network , 1997, EUROSPEECH.