Robust speech recognition using dynamic noise adaptation

Dynamic noise adaptation (DNA) [1, 2] is a model-based technique for improving automatic speech recognition (ASR) performance in noise. DNA has shown promise on artificially mixed data such as the Aurora II and DNA+Aurora II tasks [1]—significantly outperforming well-known techniques like the ETSI AFE and fMLLR [2]—but has never been tried on real data. In this paper, we present new results generated by commercial-grade ASR systems trained on large amounts of data. We show that DNA improves upon the performance of the spectral subtraction (SS) and stochastic fMLLR algorithms of our embedded recognizers, particularly in unseen noise conditions, and describe how DNA has been evolved to become suitable for deployment in low-latency ASR systems. DNA improves our best embedded system, which utilizes SS, fMLLR, and fMPE [3] by over 22% relative at SNRs below 6 dB, reducing the word error rate in these adverse conditions from 4.24% to 3.29%.

[1]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Pierre L. Dognin,et al.  Beyond linear transforms: efficient non-linear dynamic adaptation for noise robust speech recognition , 2008, INTERSPEECH.

[4]  Li Deng,et al.  Recursive estimation of nonstationary noise using iterative stochastic approximation for robust speech recognition , 2003, IEEE Trans. Speech Audio Process..

[5]  Biing-Hwang Juang,et al.  On the application of hidden Markov models for enhancing noisy speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[7]  Peder A. Olsen,et al.  Dynamic Noise Adaptation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Alex Acero,et al.  Noise robust speech recognition with a switching linear dynamic model , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Michael Picheny,et al.  Speech recognition using noise-adaptive prototypes , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10]  Li Deng,et al.  Enhancement of log Mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise , 2004, IEEE Transactions on Speech and Audio Processing.

[11]  Brendan J. Frey,et al.  ALGONQUIN: iterating laplace's method to remove multiple types of acoustic distortion for robust speech recognition , 2001, INTERSPEECH.

[12]  Geoffrey Zweig,et al.  fMPE: discriminatively trained features for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..