Improved modulation spectrum enhancement methods for robust speech recognition

In this paper, we present two novel algorithms to improve the noise robustness of features in speech recognition: modulation spectrum replacement (MSR) and modulation spectrum filtering (MSF). The magnitude spectra of feature streams are updated by referring to the information collected in the clean training set, and the resulting new feature streams are more noise-robust to achieve higher recognition accuracy. In experiments conducted on the Aurora-2 noisy digit database, we show that the proposed MSR achieves an average relative error reduction rate of nearly 57% compared to baseline processing, and MSF is specifically effective in enhancing the features preprocessed by conventional feature normalization methods to achieve even better recognition accuracy in noise-corrupted situations.

[1]  Olli Viikki,et al.  Cepstral domain segmental feature vector normalization for noise robust speech recognition , 1998, Speech Commun..

[2]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[3]  M. Kendall Probability and Statistical Inference , 1956, Nature.

[4]  Michael Small,et al.  Extension of the local subspace method to enhancement of speech with colored noise , 2008, Signal Process..

[5]  Misha Pavel,et al.  On the importance of various modulation frequencies for speech recognition , 1997, EUROSPEECH.

[6]  Hermann Ney,et al.  Quantile based histogram equalization for noise robust large vocabulary speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[8]  Haizhou Li,et al.  Normalization of the Speech Modulation Spectra for Robust Speech Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Jeih-Weih Hung,et al.  Magnitude spectrum enhancement for robust speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Michael T. Johnson,et al.  Distributed multichannel speech enhancement with minimum mean-square error short-time spectral amplitude, log-spectral amplitude, and spectral phase estimation , 2012, Signal Process..

[11]  P. J. Green,et al.  Probability and Statistical Inference , 1978 .

[12]  Ye Li,et al.  Speech Enhancement for Non-Stationary Noise Environments , 2009, 2009 International Conference on Information Engineering and Computer Science.

[13]  Shingo Yoshizawa,et al.  Direct control on modulation spectrum for noise-robust speech recognition and spectral subtraction , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[14]  Jeff A. Bilmes,et al.  MVA Processing of Speech Features , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Yoshikazu Miyanaga,et al.  Robust Speech Recognition with MSC / DRA Feature Extraction on Modulation Spectrum Domain , 2006 .