An Overview of Speech Recognition Systems

This chapter presents an introduction to automatic speech recognition systems. It includes the mathematical formulation of speech recognizers. The main components of speech recognition systems are introduced: Front-end signal processing, acoustic models, decoding, training, language model, and pronunciation dictionary. Additionally, a brief literature review of speech recognition systems is also provided. Viterbi and Baum–Welch algorithms are also discussed as the fundamental techniques for decoding and training phases, respectively.

[1]  Jian-Yun Nie,et al.  Integrating word relationships into language models , 2005, SIGIR '05.

[2]  Walt Detmar Meurers,et al.  Encyclopedia of Language and Linguistics , 2006 .

[3]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[4]  Paul Lamere,et al.  Design of the CMU Sphinx-4 Decoder , 2022 .

[5]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[6]  David Burshtein,et al.  Support Vector Machine Training for Improved Hidden Markov Modeling , 2008, IEEE Transactions on Signal Processing.

[7]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[8]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[9]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[10]  Mei Hwang Subphonetic Acoustic Modeling for Speaker-Independent Continuous Speech Recognition , 2001 .

[11]  Yifan Gong,et al.  Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[13]  Yuqing Gao,et al.  Maximum entropy direct models for speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Mohammad Tariqul Islam,et al.  Smart Antenna UKM Testbed for Digital Beamforming System , 2009, EURASIP J. Adv. Signal Process..

[15]  James R. Glass,et al.  Historical Development and Future Directions in Speech Recognition and Understanding , 2007 .

[16]  Geoffrey E. Hinton,et al.  Improving a statistical language model by modulating the effects of context words , 2008, ESANN.

[17]  Dong Yu,et al.  An introduction to voice search , 2008, IEEE Signal Processing Magazine.

[18]  Xingxian Luo Chinese Speech Recognition Based on a Hybrid SVM and HMM Architecture , 2011, ISNN.

[19]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[20]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[21]  Steve Young,et al.  A review of large-vocabulary continuous-speech recognition , 1996 .

[22]  Qin Ai-na Noise robust speech recognition based on improved hidden Markov model and wavelet neural network , 2010 .

[23]  Wayne H. Ward,et al.  Speech recognition , 1997 .

[24]  Geoffrey Zweig,et al.  A segmental CRF approach to large vocabulary continuous speech recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[25]  Richard M. Stern,et al.  Automatic generation of subword units for speech recognition systems , 2002, IEEE Trans. Speech Audio Process..

[26]  Xian Tang Hybrid Hidden Markov Model and Artificial Neural Network for Automatic Speech Recognition , 2009, 2009 Pacific-Asia Conference on Circuits, Communications and Systems.

[27]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[28]  Richard M. Stern,et al.  N-Best List Rescoring Using Syntactic Trigrams , 2004, MICAI.

[29]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[30]  X. D. Huang,et al.  Phoneme classification using semicontinuous hidden Markov models , 1992, IEEE Trans. Signal Process..

[31]  Jun Cai,et al.  A New Hybrid Hmm/Ann Model for Speech Recognition , 2005, AIAI.

[32]  Li Deng,et al.  Challenges in adopting speech recognition , 2004, CACM.

[33]  Jeff A. Bilmes,et al.  What HMMs Can Do , 2006, IEICE Trans. Inf. Syst..

[34]  Hervé Bourlard,et al.  Continuous speech recognition , 1995, IEEE Signal Process. Mag..

[35]  Mei-Yuh Hwang,et al.  Shared-distribution hidden Markov models for speech recognition , 1993, IEEE Trans. Speech Audio Process..

[36]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[37]  M. Khasawneh,et al.  The application of polynomial discriminant function classifiers to isolated Arabic speech recognition , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[38]  Jean-Pierre Martens,et al.  Automated Intelligibility Assessment of Pathological Speech Using Phonological Features , 2009, EURASIP J. Adv. Signal Process..

[39]  James K. Baker,et al.  Stochastic modeling for automatic speech understanding , 1990 .

[40]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[41]  Alfred Mertins,et al.  Automatic speech recognition and speech variability: A review , 2007, Speech Commun..

[42]  Steve Young,et al.  A review of large-vocabulary continuous-speech , 1996, IEEE Signal Process. Mag..

[43]  Kai-Fu Lee,et al.  On large-vocabulary speaker-independent continuous speech recognition , 1988, Speech Commun..