Artificial and online acquired noise dictionaries for noise robust ASR

Recent research has shown that speech can be sparsely represented using a dictionary of speech segments spanning multiple frames, exemplars, and that such a sparse representation can be recovered using Compressed Sensing techniques. In previous work we proposed a novel method for noise robust automatic speech recognition in which we modelled noisy speech as a sparse linear combination of speech and noise exemplars extracted from the training data. The weights of the speech exemplars were then used to provide noise robust HMM-state likelihoods. In this work we propose to acquire additional noise exemplars during decoding and the use of a noise dictionary which is artificially constructed. Experiments on AURORA-2 show that the artificial noise dictionary works better for noises not seen during training and that acquiring additional exemplars can improve recognition accuracy.

[1]  Louis ten Bosch,et al.  Using a DBN to integrate sparse classification and GMM-based ASR , 2010, INTERSPEECH.

[2]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[3]  Louis ten Bosch,et al.  Using sparse representations for exemplar based continuous digit recognition , 2009, 2009 17th European Signal Processing Conference.

[4]  Hugo Van hamme,et al.  Robust speech recognition using missing data techniques in the prospect domain and fuzzy masks , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Hugo Van hamme,et al.  Compressive Sensing for Missing Data Imputation in Noise Robust Speech Recognition , 2010, IEEE Journal of Selected Topics in Signal Processing.

[8]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[9]  Li Deng,et al.  Structure-based and template-based automatic speech recognition - comparing parametric and non-parametric approaches , 2007, INTERSPEECH.

[10]  Tuomas Virtanen,et al.  Noise robust exemplar-based connected digit recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.