Exemplar-based speech enhancement and its application to noise-robust automatic speech recognition

In this work an exemplar-based technique for speech enhancement of noisy speech is proposed. The technique works by finding a sparse representation of the noisy speech in a dictionary containing both speech and noise exemplars, and uses the activated dictionary atoms to create a time-varying filter to enhance the noisy speech. The speech enhancement algorithm is evaluated using measured signal to noise ratio (SNR) improvements as well as by using automatic speech recognition. Experiments on the PASCAL CHiME challenge corpus, which contains speech corrupted by both reverberation and authentic living room noise at varying SNRs ranging from 9 to -6 dB, confirm the validity of the proposed technique. Examples of enhanced signals are available at http://www.cs.tut.fi/ ˜tuomasv/. Index Terms: speech enhancement, exemplar-based, noise robustness, sparse representations

[1]  John R. Hershey,et al.  Super-human multi-talker speech recognition: A graphical modeling approach , 2010, Comput. Speech Lang..

[2]  Emmanuel Vincent,et al.  First Stereo Audio Source Separation Evaluation Campaign: Data, Algorithms and Results , 2007, ICA.

[3]  Tuomas Virtanen,et al.  Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[5]  Tuomas Virtanen,et al.  Noise robust exemplar-based connected digit recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Roger K. Moore,et al.  Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[8]  Richard M. Stern,et al.  On tracking noise with linear dynamical system models , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[10]  Tuomas Virtanen,et al.  Non-negative matrix deconvolution in noise robust speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Bhiksha Raj,et al.  A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds , 2009, NIPS.

[12]  Bhiksha Raj,et al.  Non-negative matrix factorization based compensation of music for automatic speech recognition , 2010, INTERSPEECH.

[13]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.