Speaker Independent Connected Digit Recognition Using VQ and HMM in Additive Noise Environment

The main objective of this paper is to discuss the effectiveness of concatenated perceptual features and the noise reduction technique based on wavelet transform and Recursive least square filtering in getting the good recognition rate for the peculiar combination of connected digits in additive noise environment. The proposed concatenated perceptual features are captured and code book indices are extracted. Expectation maximization algorithm is used to generate discrete HMM models for the connected digits. Speech recognition system is evaluated on clean and noisy test speeches and the selection is based on which model gives maximum log likelihood value. Speeches for this work are randomly chosen from “TI Digits_1”, “TI Digits_2” databases. This concatenated perceptual feature yields the accuracy of 81.4% and 73% for the combination of connected digits (10 - 19) and (12- 19,21,31,41,51,61,71,81,91). Pink noise, white noise, babble noise and factory noise are considered in this work.

[1]  A.K. Swain,et al.  Estimation of LPC parameters of speech signals in noisy environment , 2004, 2004 IEEE Region 10 Conference TENCON 2004..

[2]  M.M. Azmi,et al.  Syllable-based automatic arabic speech recognition in noisy enviroment , 2008, 2008 International Conference on Audio, Language and Image Processing.

[3]  Roberto Togneri,et al.  Perceptual features for automatic speech recognition in noisy environments , 2009, Speech Commun..

[4]  Y. Venkataramani,et al.  Perceptual Features Based Isolated Digit and Continuous Speech Recognition Using Iterative Clustering Approach , 2009, 2009 First International Conference on Networks & Communications.

[5]  Hynek Hermansky,et al.  Perceptually based processing in automatic speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[7]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8]  Yasuo Ariki,et al.  Robust speech recognition in additive and channel noise environments using GMM and EM algorithm , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Urmila Shrawankar,et al.  Speech Enhancement Using Pitch Detection Approach For Noisy Environment , 2013, ArXiv.

[10]  José Luis Oropeza Rodríguez,et al.  Using Adaptive Filter and Wavelets to Increase Automatic Speech Recognition Rate in Noisy Environment , 2007, MICAI.

[11]  Hynek Hermansky,et al.  The challenge of inverse-E: the RASTA-PLP method , 1991, [1991] Conference Record of the Twenty-Fifth Asilomar Conference on Signals, Systems & Computers.

[12]  Alexander Gelbukh,et al.  MICAI 2007: Advances in Artificial Intelligence, 6th Mexican International Conference on Artificial Intelligence, Aguascalientes, Mexico, November 4-10, 2007, Proceedings , 2007, MICAI.

[13]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[14]  José Francisco Martínez-Trinidad,et al.  Progress in Pattern Recognition, Image Analysis and Applications, 12th Iberoamericann Congress on Pattern Recognition, CIARP 2007, Valparaiso, Chile, November 13-16, 2007, Proceedings , 2008, CIARP.

[15]  Luis Pastor Sánchez Fernández,et al.  Using Adaptive Filter to Increase Automatic Speech Recognition Rate in a Digit Corpus , 2007, CIARP.

[16]  Masakiyo Fujimoto,et al.  Study of integration of statistical model-based voice activity detection and noise suppression , 2008, INTERSPEECH.

[17]  Yuval Bistritz,et al.  Enhancement of connected words in an extremely noisy environment , 1997, IEEE Trans. Speech Audio Process..

[18]  Y. Venkataramani,et al.  Speaker independent continuous speech and isolated digit recognition using VQ and HMM , 2011, 2011 International Conference on Communications and Signal Processing.

[19]  Weaam Alkhaldi,et al.  Automatic speech/speaker recognition in noisy environments using wavelet transform , 2002, The 2002 45th Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002..