Model-based approaches to handling additive noise in reverberant environments

Model-based approaches to handle additive and convolutional noise have been extensively investigated and used. However, the application of these schemes to handling reverberant noise has received less attention. This paper examines the extension of two standard additive/convolutional noise approaches to handling reverberant noise. The first is an extension of vector Taylor series (VTS) compensation, reverberant VTS, where a mismatch function including reverberant noise is used. The second scheme modifies constrained MLLR to allow a wide-span of frames to be taken into account and “projected” into the required dimensionality. To allow additive noise to be handled, both these schemes are combined with standard VTS. The approaches are evaluated and compared on two tasks, MC-WSJ-AV, and a reverberant simulated version of AURORA-4.

[1]  Roland Maas,et al.  Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Mark J. F. Gales,et al.  Adaptive Training with Joint Uncertainty Decoding for Robust Recognition of Noisy Data , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Hans-Günter Hirsch,et al.  The simulation of realistic acoustic input scenarios for speech recognition systems , 2005, INTERSPEECH.

[4]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[5]  Yongqiang Wang,et al.  Speaker and noise factorisation on the AURORA4 task , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[7]  Hans-Günter Hirsch,et al.  A new approach for the adaptation of HMMs to reverberation and background noise , 2008, Speech Commun..

[8]  Geoffrey Zweig,et al.  Linear feature space projections for speaker adaptation , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[10]  Roland Maas,et al.  Model-based dereverberation in the logmelspec domain for robust distant-talking speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Reinhold Häb-Umbach,et al.  Model-Based Feature Enhancement for Reverberant Speech Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Shinji Watanabe,et al.  Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Shigeki Sagayama,et al.  Model Adaptation for Long Convolutional Distortion by Maximum Likelihood Based State Filtering Approach , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  YoungSteve,et al.  The application of hidden Markov models in speech recognition , 2007 .

[15]  Steve J. Young,et al.  Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[17]  Michael Picheny,et al.  Robust speech recognition in noise --- performance of the IBM continuous speech recogniser on the ARPA noise spoke task , 1995 .

[18]  Mark J. F. Gales,et al.  The Application of Hidden Markov Models in Speech Recognition , 2007, Found. Trends Signal Process..

[19]  I. McCowan,et al.  The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..