Missing Data Solutions for Robust Speech Recognition

Current automatic speech recognisers rely for a great deal on statistical models learned from training data. When they are deployed in conditions that differ from those observed in the training data, the generative models are unable to explain the incoming data and poor accuracy results. A very noticeable effect is deterioration due to background noise. In the MIDAS project, the state-of-the-art in noise robustness was advanced on two fronts, both making use of the missing data approach. First, novel sparse exemplar-based representations of speech were proposed. Compressed sensing techniques were used to impute noise-corrupted data from exemplars. Second, a missing data approach was adopted in the context of a large vocabulary speech recogniser, resulting in increased robustness at high noise levels without compromising on accuracy at low noise levels. The performance of the missing data recogniser was compared with that of the Nuance VOCON-3200 recogniser in a variety of noise conditions observed in field data.

[1]  Guy J. Brown,et al.  Mask Estimation and Sparse Imputation for Missing Data Speech Recognition in Multisource Reverberant Environments , 2011 .

[2]  Jort Gemmeke,et al.  Noise robust ASR: Missing data techniques and beyond , 2006 .

[3]  B. Cranen,et al.  Noise reduction through compressed sensing , 2008, INTERSPEECH.

[4]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Tuomas Virtanen,et al.  Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Hugo Van hamme Robust speech recognition using missing feature theory in the cepstral or LDA domain , 2003, INTERSPEECH.

[7]  Guy J. Brown,et al.  Techniques for handling convolutional distortion with 'missing data' automatic speech recognition , 2004, Speech Commun..

[8]  Phil D. Green,et al.  State based imputation of missing data for robust speech recognition and speech enhancement , 1999, EUROSPEECH.

[9]  Phil D. Green,et al.  Handling missing data in speech recognition , 1994, ICSLP.

[10]  Richard M. Stern,et al.  Reconstruction of missing features for robust speech recognition , 2004, Speech Commun..

[11]  Dirk Van Compernolle,et al.  Optimal feature sub-space selection based on discriminant analysis , 1999, EUROSPEECH.

[12]  R.M. Stern,et al.  Missing-feature approaches in speech recognition , 2005, IEEE Signal Processing Magazine.

[13]  Hugo Van hamme,et al.  Handling convolutional noise in missing data automatic speech recognition , 2006, INTERSPEECH.

[14]  Bert Cranen,et al.  Missing data imputation using compressive sensing techniques for connected digit recognition , 2009, 2009 16th International Conference on Digital Signal Processing.

[15]  Hugo Van hamme,et al.  Vector-quantization based mask estimation for missing data automatic speech recognition , 2007, INTERSPEECH.

[16]  Gaël Richard,et al.  The speechdat-car multilingual speech databases for in-car applications: some first validation results , 1999, EUROSPEECH.

[17]  Bert Cranen,et al.  Sparse imputation for noise robust speech recognition using soft masks , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Jean Paul Haton,et al.  On noise masking for automatic missing data speech recognition: A survey and discussion , 2007, Comput. Speech Lang..

[19]  Hugo Van hamme Robust speech recognition using cepstral domain missing data techniques and noisy masks , 2004, ICASSP.

[20]  Hugo Van hamme,et al.  Automatic Speech Recognition Using Missing Data Techniques: Handling of Real-World Data , 2011, Robust Speech Recognition of Uncertain or Missing Data.

[21]  Krzysztof Marasek,et al.  SPEECON – Speech Databases for Consumer Devices: Database Specification and Validation , 2002, LREC.

[22]  Hugo Van hamme,et al.  Feature versus model based noise robustness , 2010, INTERSPEECH.

[23]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[24]  Hugo Van hamme,et al.  Multi-candidate missing data imputation for robust speech recognition , 2012, EURASIP Journal on Audio, Speech, and Music Processing.

[25]  Shrikanth Narayanan,et al.  Enhanced Sparse Imputation Techniques for a Robust Speech Recognition Front-End , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[27]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[28]  Dirk Van Compernolle,et al.  Reduced semi-continuous models for large vocabulary continuous speech recognition in Dutch , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[29]  Mikko Kurimo,et al.  Unlimited vocabulary speech recognition with morph language models applied to Finnish , 2006, Comput. Speech Lang..

[30]  Bert Cranen,et al.  Sparse imputation for large vocabulary noise robust ASR , 2011, Comput. Speech Lang..

[31]  Tuomas Virtanen,et al.  Exemplar-based Recognition of Speech in Highly Variable Noise , 2011 .

[32]  Mikko Kurimo,et al.  Missing feature reconstruction and acoustic model adaptation combined for large vocabulary continuous speech recognition , 2008, 2008 16th European Signal Processing Conference.

[33]  Tuomas Virtanen,et al.  Toward a practical implementation of exemplar-based noise robust ASR , 2011, 2011 19th European Signal Processing Conference.

[34]  Hugo Van hamme,et al.  Compressive Sensing for Missing Data Imputation in Noise Robust Speech Recognition , 2010, IEEE Journal of Selected Topics in Signal Processing.

[35]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[36]  Hugo Van hamme Handling Time-Derivative Features in a Missing Data Framework for Robust Automatic Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[37]  Ulpu Remes,et al.  Observation uncertainty measures for sparse imputation , 2010, INTERSPEECH.

[38]  Hugo Van hamme,et al.  PROSPECT features and their application to missing data techniques for robust speech recognition , 2004, INTERSPEECH.

[39]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .