Histogram equalization of real and imaginary modulation spectra for noise-robust speech recognition

Histogram equalization (HEQ) of acoustic features has received considerable attention in the area of robust speech recognition because of its relative simplicity and good empirical performance. This paper presents a novel HEQbased feature extraction approach that performs equalization in both acoustic frequency and modulation frequency domains for obtaining better noise-robust features. In particular, the real and imaginary acoustic spectra are first individually transformed to the modulation domain via discrete Fourier transform (DFT). The HEQ process is then carried on the corresponding magnitude modulation spectra so as to compensate for the noise distortions. Finally, the equalized modulation spectra are converted back to form the real and imaginary acoustic spectra, respectively. By doing so, we can enhance not only the magnitude but also the phase components of the acoustic spectra, and thereby create more noise-robust cepstral features. The experiments conducted on the Aurora-2 clean-condition database and task reveal that the presented approach delivers superior recognition accuracy in comparison with some other HEQ-related methods and the well-known advanced front-end (AFE) extraction scheme, which supports the potential utility of this novel approach.

[1]  Hermann Ney,et al.  Quantile based histogram equalization for noise robust large vocabulary speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[3]  Naoya Wada,et al.  Cepstral gain normalization for noise robust speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Denis Jouvet,et al.  Evaluation of a noise-robust DSR front-end on Aurora databases , 2002, INTERSPEECH.

[5]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[6]  Lin-Shan Lee,et al.  Modulation Spectrum Equalization for Improved Robust Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[8]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[9]  Pascal Scalart,et al.  Improved Signal-to-Noise Ratio Estimation for Speech Enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Haizhou Li,et al.  Normalization of the Speech Modulation Spectra for Robust Speech Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[12]  Berlin Chen,et al.  Exploring the Use of Speech Features and Their Corresponding Distribution Characteristics for Robust Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Olli Viikki,et al.  Cepstral domain segmental feature vector normalization for noise robust speech recognition , 1998, Speech Commun..

[14]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[15]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[16]  Jeih-Weih Hung,et al.  Improved modulation spectrum enhancement methods for robust speech recognition , 2012, Signal Process..

[17]  Yi Zhang,et al.  Spectral subtraction on real and imaginary modulation spectra , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Jeff A. Bilmes,et al.  MVA Processing of Speech Features , 2007, IEEE Transactions on Audio, Speech, and Language Processing.