Voice Activity Detector and Noise Trackers for Speech Recognition System in Noisy Environment

The well known fact is that the performance of the Speech Recognition System degrades drastically in Adverse Environments. Additive noise is one of the major element of adverse noisy environment. Detecting voiced, un-voiced or silent speech signal in noisy environment is not an easy task. A voice activity detector (VAD) is effective when the noise is stationary; it often fails when the noise statistics change during speech presence. Moreover, accurate voice activity detection under very low signal-to-noise-ratio (SNR) conditions is not trivial. The noise estimate can have a major impact on the quality of the enhanced signal. If the noise estimate is too low, annoying residual noise will be audible, while if the noise estimate is too high, speech will be distorted resulting possibly in intelligibility loss. The different VAD methods noise tracking approaches need to implement to enhance the speech signals collected through microphone in human-computer interaction. Although such approaches might work satisfactorily in stationary noise (e.g., white noise), but they may not work well in more realistic environments (e.g., in a restaurant) where the spectral characteristics of the noise might be changing constantly. Hence there is a need to update the noise spectrum continuously over time and this can be done using noise-tracking algorithms. This paper presents some Voice Activity Detecting (VAD) and noise tracking approaches that will help to improve the performance of speech recognition system in adverse environment for humancomputer interaction.

[1]  Richard M. Stern,et al.  Feature compensation based on switching linear dynamic model , 2005, IEEE Signal Processing Letters.

[2]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[3]  Hugo Van hamme,et al.  Kalman and unscented kalman filter feature enhancement for noise robust ASR , 2005, INTERSPEECH.

[4]  Israel Cohen,et al.  Speech enhancement for non-stationary noise environments , 2001, Signal Process..

[5]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[6]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[7]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[8]  Sang Ryong Kim,et al.  Application Of Vts To Environment Compensation With Noise Statistics , 1997 .

[9]  Israel Cohen,et al.  Speech spectral modeling and enhancement based on autoregressive conditional heteroscedasticity models , 2006, Signal Process..

[10]  Brendan J. Frey,et al.  ALGONQUIN: iterating laplace's method to remove multiple types of acoustic distortion for robust speech recognition , 2001, INTERSPEECH.

[11]  Li Deng,et al.  A comparison of three non-linear observation models for noisy speech features , 2003, INTERSPEECH.

[12]  Tet Hin Yeap,et al.  Feature Enhancement for Noisy Speech Recognition With a Time-Variant Linear Predictive HMM Structure , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Nam Soo Kim Nonstationary environment compensation based on sequential estimation , 1998, IEEE Signal Processing Letters.

[14]  Nam Soo Kim IMM-based estimation for slowly evolving environments , 1998, IEEE Signal Processing Letters.

[15]  Rainer Martin,et al.  Spectral Subtraction Based on Minimum Statistics , 2001 .

[16]  David Kryze,et al.  Vector taylor series based joint uncertainty decoding , 2006, INTERSPEECH.

[17]  Reinhold Häb-Umbach,et al.  Parameter Estimation of a State-Space Model of Noise for Robust Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Pedro J. Moreno,et al.  Speech recognition in noisy environments , 1996 .

[19]  Alex Acero,et al.  Noise robust speech recognition with a switching linear dynamic model , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[21]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Hugo Van hamme,et al.  Model-based feature enhancement with uncertainty decoding for noise robust ASR , 2006, Speech Commun..

[23]  Veronique Stouten,et al.  Robust Automatic Speech Recognition in Time-Varying Environments (Robuuste automatische spraakherkenning in een tijdsvariërende omgeving) , 2006 .

[24]  Yasuo Ariki,et al.  Robust speech recognition in additive and channel noise environments using GMM and EM algorithm , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Mohamed Afify Accurate compensation in the log-spectral domain for noisy speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[26]  Sanjit K. Mitra,et al.  Voice activity detection based on multiple statistical models , 2006, IEEE Transactions on Signal Processing.

[27]  Jesper Jensen,et al.  DFT domain subspace based noise tracking for speech enhancement , 2007, INTERSPEECH.

[28]  Mohammad Hossein Moattar,et al.  A simple but efficient real-time Voice Activity Detection algorithm , 2009, 2009 17th European Signal Processing Conference.

[29]  Reinhold Häb-Umbach,et al.  Modeling the dynamics of speech and noise for speech feature enhancement in ASR , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Bhiksha Raj,et al.  Tracking noise via dynamical systems with a continuum of states , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[31]  Tet Hin Yeap,et al.  Speech Feature Estimation Under the Presence of Noise with a Switching Linear Dynamic Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[32]  Jesper Jensen,et al.  Noise Tracking Using DFT Domain Subspace Decompositions , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  P. T. Vanathi,et al.  Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment , 2010 .

[34]  Mark A. Clements,et al.  Using observation uncertainty in HMM decoding , 2002, INTERSPEECH.