Audio source separation with one sensor for robust speech recognition

In this paper, we address the problem of noise compensation in speech signals for robust speech recognition. Several classical denoising methods in the field of speech and signal processing are compared on speech corrupted by music, which correspond to a frequent situation in broadcast news transcription tasks. We also present two new source separation techniques, namely adaptive Wiener filtering and adaptive shrinkage. These techniques rely on the use of a dictionary of spectral shapes to deal with the non stationarity of the signals. The algorithms are first compared on the source separation task and assessed in terms of average distortion. Their effect on the entire transcription system is eventually compared in terms of word error rate. Results show that the proposed adaptive Wiener filter approach yields a significant improvement of the transcription accuracy at signal/noise ratios greater than 15 dB.

[1]  Elie Laurent Benaroya Séparation de plusieurs sources sonores avec un seul microphone , 2003 .

[2]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[3]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[4]  Maxine Eskénazi,et al.  BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[5]  R. Gribonval,et al.  Proposals for Performance Measurement in Source Separation , 2003 .

[6]  Juan Arturo Nolazco-Flores,et al.  Continuous speech recognition in noise using spectral subtraction and HMM adaptation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Mark J. F. Gales,et al.  Model-based techniques for noise robust speech recognition , 1995 .

[8]  S. Molau,et al.  Feature space normalization in adverse acoustic conditions , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.