论文信息 - Using sparse representations for missing data imputation in noise robust speech recognition - 字舞流文

Using sparse representations for missing data imputation in noise robust speech recognition

Noise robustness of automatic speech recognition benefits from using missing data imputation: Prior to recognition the parts of the spectrogram dominated by noise are replaced by clean speech estimates. Especially at low SNRs each frame contains at best only a few uncorrupted coefficients. This makes frame-by-frame restoration of corrupted feature vectors error-prone, and recognition accuracy will mostly be sub-optimal. In this paper we present a novel imputation technique working on entire words. A word is sparsely represented in an overcomplete basis of exemplar (clean) speech signals using only the uncorrupted time-frequency elements of the word. The corrupted elements are replaced by estimates obtained by projecting the sparse representation in the basis. We achieve recognition accuracies of 92% at SNR -5 dB using oracle masks on AURORA-2 as compared to 61% using a conventional frame-based approach. The performance obtained with estimated masks can be directly related to the proportion of correctly identified uncorrupted coefficients.

Bert Cranen | Jort F. Gemmeke | J. Gemmeke | B. Cranen

[1] Balas K. Natarajan,et al. Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[2] Allen Y. Yang,et al. Feature Selection in Face Recognition: A Sparse Representation Perspective , 2007 .

[3] Richard Lippmann,et al. Speech recognition by machines and humans , 1997, Speech Commun..

[4] Richard M. Stern,et al. Reconstruction of incomplete spectrograms for robust speech recognition , 2000 .

[5] Hugo Van hamme. Handling Time-Derivative Features in a Missing Data Framework for Robust Automatic Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6] Jean Paul Haton,et al. On noise masking for automatic missing data speech recognition: A survey and discussion , 2007, Comput. Speech Lang..

[7] David L Donoho,et al. Compressed sensing , 2006, IEEE Transactions on Information Theory.

[8] H. Van hamme,et al. Robust speech recognition using cepstral domain missing data techniques and noisy masks , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] Hugo Van hamme,et al. PROSPECT features and their application to missing data techniques for robust speech recognition , 2004, INTERSPEECH.

[10] Daniel P. W. Ellis,et al. Decoding speech in the presence of other sources , 2005, Speech Commun..

[11] Phil D. Green,et al. Missing data theory, spectral subtraction and signal-to-noise estimation for robust ASR: an integrated study , 1999, EUROSPEECH.

[12] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[13] Phil D. Green,et al. Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[14] Yin Zhang Caam. When is missing data recoverable ? , 2006 .

[15] Richard M. Stern,et al. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition , 2004, Speech Commun..

[16] E.J. Candes. Compressive Sampling , 2022 .

[17] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.

[18] D. Donoho. For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[19] David Pearce,et al. The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.