Using Gaussian Mixtures for Hindi Speech Recognition System

The goal of automatic speech recognition (ASR) system is to accurately and efficiently convert a speech signal into a text message independent of the device, speaker or the environment. In general the speech signal is captured and pre-processed at front-end for feature extraction and evaluated at back-end using the Gaussian mixture hidden Markov model. In this statistical approach since the evaluation of Gaussian likelihoods dominate the total computational load, the appropriate selection of Gaussian mixtures is very important depending upon the amount of training data. As the small databases are available to train the Indian languages ASR system, the higher range of Gaussian mixtures (i.e. 64 and above), normally used for European languages, cannot be applied for them. This paper reviews the statistical framework and presents an iterative procedure to select an optimum number of Gaussian mixtures that exhibits maximum accuracy in the context of Hindi speech recognition system.

[1]  Douglas B. Paul,et al.  Algorithms for an Optimal A* Search and Linearizing the Search in the Stack Decoder* , 1991, HLT.

[2]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[3]  Haihua Xu,et al.  Minimum hypothesis phone error as a decoding method for speech recognition , 2009, INTERSPEECH.

[4]  Kai Feng,et al.  Approaches to automatic lexicon learning with limited training examples , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Hermann Ney,et al.  The use of a one-stage dynamic programming algorithm for connected word recognition , 1984 .

[6]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  Mark J. F. Gales,et al.  Minimum phone error training of precision matrix models , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[9]  Mayank Dave,et al.  Discriminative Techniques for Hindi Speech Recognition System , 2011, ICIS 2011.

[10]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[11]  Chin-Hui Lee,et al.  Large vocabulary speech recognition using subword units , 1993, Speech Commun..

[12]  Mark J. F. Gales,et al.  The Application of Hidden Markov Models in Speech Recognition , 2007, Found. Trends Signal Process..

[13]  S. Molau,et al.  Feature space normalization in adverse acoustic conditions , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Douglas D. O'Shaughnessy,et al.  Interacting with computers by voice: automatic speech recognition and synthesis , 2003, Proc. IEEE.

[15]  Ashish Verma,et al.  A large-vocabulary continuous speech recognition system for Hindi , 2004, IBM J. Res. Dev..

[16]  Hui Jiang,et al.  Incorporating Training Errors for Large Margin HMMS Under Semi-Definite Programming Framework , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[17]  Jun Cai,et al.  Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition , 2009, Comput. Speech Lang..

[18]  Wu Chou,et al.  Robust decision tree state tying for continuous speech recognition , 2000, IEEE Trans. Speech Audio Process..

[19]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[20]  Claudio Becchetti,et al.  Speech Recognition: Theory and C++ Implementation , 1999 .

[21]  Steve Young,et al.  A review of large-vocabulary continuous-speech recognition , 1996 .

[22]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[23]  Hui Jiang,et al.  Discriminative training of HMMs for automatic speech recognition: A survey , 2010, Comput. Speech Lang..

[24]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[26]  Mark J. F. Gales,et al.  State-based Gaussian selection in large vocabulary continuous speech recognition using HMMs , 1999, IEEE Trans. Speech Audio Process..

[27]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[28]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[29]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[30]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .