Recognition of Reverberant Speech by Missing Data Imputation and NMF Feature Enhancement

The problem of reverberation in speech recognition is addressed in this study by extending a noise-robust feature enhancement method based on non-negative matrix factorization. The signal model of the observation as a linear combination of sample spectrograms is augmented by a mel-spectral feature domain convolution to account for the effects of room reverberation. The proposed method is contrasted with missing data techniques for reverberant speech, and evaluated for speech recognition performance using the REVERB challenge corpus. Our results indicate consistent gains in recognition performance compared to the baseline system, with a relative improvement in word error rate of 42.6% for the optimal case.

[1]  J. Foote,et al.  WSJCAM0: A BRITISH ENGLISH SPEECH CORPUS FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION , 1995 .

[2]  Steve Renals,et al.  WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[4]  T. Gonen,et al.  Questions , 1927, Journal of Family Planning and Reproductive Health Care.

[5]  Guy J. Brown,et al.  Techniques for handling convolutional distortion with 'missing data' automatic speech recognition , 2004, Speech Commun..

[6]  Richard M. Stern,et al.  A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition , 2004, Speech Commun..

[7]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[8]  Marc Font,et al.  Multi-microphone Signal Processing for Automatic Speech Recognition in Meeting Rooms , 2005 .

[9]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[10]  Guy J. Brown,et al.  Recognition of Reverberant Speech using Full Cepstral Features and Spectral Missing Data , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Mark J. F. Gales,et al.  Issues with uncertainty decoding for noise robust automatic speech recognition , 2008, Speech Commun..

[12]  Hirokazu Kameoka,et al.  Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Hugo Van hamme,et al.  Application of noise robust MDT speech recognition on the SPEECON and speechdat-car databases , 2009, INTERSPEECH.

[14]  Richard M. Stern,et al.  Gammatone sub-band magnitude-domain dereverberation for ASR , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Tuomas Virtanen,et al.  Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Tara N. Sainath,et al.  Exemplar-Based Processing for Speech Recognition: An Overview , 2012, IEEE Signal Processing Magazine.

[17]  Ulpu Remes Bounded conditional mean imputation with an approximate posterior , 2013, INTERSPEECH.

[18]  Tomohiro Nakatani,et al.  The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[19]  Guy J. Brown,et al.  Mask estimation and imputation methods for missing data speech recognition in a multisource reverberant environment , 2013, Comput. Speech Lang..

[20]  Kalle J. Palomäki,et al.  Estimating Uncertainty to Improve Exemplar-Based Feature Enhancement for Noise Robust Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.