Mask Estimation for Missing Data Recognition using Background Noise Sniffing

This paper addresses the problem of spectrographic mask estimation in the context of missing data recognition. At the difference of other denoising methods, missing data recognition does not match the whole spectrum with the acoustic models, but rather considers that some time-frequency pixels are missing, i.e. corrupted by noise. Correctly estimating these "masks" is very important for missing data recognizers. We propose a new approach that exploits some a priori knowledge about these masks in typical noisy environments to address this difficult challenge. The proposed mask is then obtained by combining these noise dependent masks. The combination is led by an environmental "sniffing" module that estimates the probability of being in each typical noisy condition. This missing data mask estimation procedure has been integrated in a complete missing data recognizer using bounded marginalization. Our approach is evaluated on the Auroral database

[1]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[2]  Andrew C. Morris Data utility modelling for mismatch reduction , 2001 .

[3]  Andrzej Drygajlo,et al.  Speaker verification in noisy environments with combined spectral subtraction and missing feature theory , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Hervé Bourlard,et al.  From missing data to maybe useful data: soft data modelling for noise robust ASR , 2001 .

[5]  Jon Barker,et al.  Robust ASR based on clean speech models: an evaluation of missing data techniques for connected digit recognition in noise , 2001, INTERSPEECH.

[6]  Richard M. Stern,et al.  Environment-independent mask estimation for missing-feature reconstruction , 2005, INTERSPEECH.

[7]  H. Van hamme,et al.  Robust speech recognition using cepstral domain missing data techniques and noisy masks , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Sam T. Roweis,et al.  Factorial models and refiltering for speech separation and denoising , 2003, INTERSPEECH.

[9]  Jon Barker,et al.  LINKING AUDITORY SCENE ANALYSIS AND ROBUST ASR BY MISSING DATA TECHNIQUES , 2001 .

[10]  Michael L. Seltzer,et al.  AUTOMATIC DETECTION OF CORRUPT SPECTROGRAPHIC FEATURES FOR ROBUST SPEECH RECOGNITION , 2000 .

[11]  P. Renevey Speech recognition in noisy conditions using missing feature approach , 2000 .

[12]  Guy J. Brown,et al.  A neural oscillator sound separator for missing data speech recognition , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[13]  John H. L. Hansen,et al.  Environmental sniffing: robust digit recognition for an in-vehicle environment , 2003, INTERSPEECH.

[14]  Daniel P. W. Ellis,et al.  Decoding speech in the presence of other sources , 2005, Speech Commun..