Feature Enhancement for Noisy Speech Recognition With a Time-Variant Linear Predictive HMM Structure

This paper presents a new approach for speech feature enhancement in the log-spectral domain for noisy speech recognition. A switching linear dynamic model (SLDM) is explored as a parametric model for the clean speech distribution. Each multivariate linear dynamic model (LDM) is associated with the hidden state of a hidden Markov model (HMM) as an attempt to describe the temporal correlations among adjacent frames of speech features. The state transition on the Markov chain is the process of activating a different LDM or activating some of them simultaneously by different probabilities generated by the HMM. Rather than holding a transition probability for the whole process, a connectionist model is employed to learn the time variant transition probabilities. With the resulting SLDM as the speech model and with a model for the noise, speech and noise are jointly tracked by means of switching Kalman filtering. Comprehensive experiments are carried out using the Aurora2 database to evaluate the new algorithm. The results show that the new SLDM approach can further improve the speech feature enhancement performance in terms of noise-robust recognition accuracy, since the transition probabilities among the LDMs can be described more precisely at each time point.

[1]  Xin Wang,et al.  Time-line hidden Markov experts for time series prediction , 2003, International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003.

[2]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[3]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[4]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[5]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[6]  Reinhold Häb-Umbach,et al.  A comparison of particle filtering variants for speech feature enhancement , 2005, INTERSPEECH.

[7]  Klaus-Robert Müller,et al.  Identification of nonstationary dynamics in physiological recordings , 2000, Biological Cybernetics.

[8]  Klaus Pawelzik,et al.  Hidden Markov mixtures of experts with an application to EEG recordings from sleep , 1999 .

[9]  Di Wang,et al.  Novel Self-Organizing Takagi Sugeno Kang Fuzzy Neural Networks Based on ART-like Clustering , 2004, Neural Processing Letters.

[10]  Richard M. Stern,et al.  On tracking noise with linear dynamical system models , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Alex Acero,et al.  Noise robust speech recognition with a switching linear dynamic model , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[13]  Nam Soo Kim Nonstationary environment compensation based on sequential estimation , 1998, IEEE Signal Processing Letters.

[14]  Nam Soo Kim IMM-based estimation for slowly evolving environments , 1998, IEEE Signal Processing Letters.

[15]  Olivier Siohan,et al.  Sequential estimation with optimal forgetting for robust speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[16]  Rhee Man Kil,et al.  Auditory processing of speech signals for robust speech recognition in real-world noisy environments , 1999, IEEE Trans. Speech Audio Process..

[17]  Tet Hin Yeap,et al.  Noise compensation using interacting multiple kalman filters , 2005, INTERSPEECH.

[18]  Li Deng,et al.  Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features , 2004, IEEE Transactions on Speech and Audio Processing.

[19]  Francis Jack Smith,et al.  Robust speech recognition using probabilistic union models , 2002, IEEE Trans. Speech Audio Process..

[20]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[21]  Michael O. Kolawole,et al.  Estimation and tracking , 2002 .

[22]  Rudolph van der Merwe,et al.  The unscented Kalman filter for nonlinear estimation , 2000, Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373).

[23]  Venkatesh Krishnan,et al.  Noise Robust Aurora-2 Speech Recognition Employing a Codebook-Constrained Kalman Filter Preprocessor , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[24]  Tet Hin Yeap,et al.  Noisy Speech Feature Estimation on the Aurora2 Database using a Switching Linear Dynamic Model , 2007, J. Multim..

[25]  Saeed Vaseghi,et al.  Speech recognition in noisy environments , 1992, ICSLP.

[26]  Ki Yong Lee,et al.  Efficient recursive estimation for speech enhancement in colored noise , 1996, IEEE Signal Processing Letters.

[27]  Tet Hin Yeap,et al.  Speech Feature Estimation Under the Presence of Noise with a Switching Linear Dynamic Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[28]  Patrick Kenny,et al.  A linear predictive HMM for vector-valued observations with applications to speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[29]  Richard M. Stern,et al.  Feature compensation based on switching linear dynamic model , 2005, IEEE Signal Processing Letters.