Sub-Band Level Histogram Equalization for Robust Speech Recognition

This paper describes a novel modification of Histogram Equalization approach to robust speech recognition. We propose separate equalization of the high frequency and low frequency bands. We study different combinations of the sub-band equalization and obtain best results when we performs a twostage equalization. First, conventional Histogram Equalization (HEQ) is performed on the cepstral features, which does not completely equalize high frequency and low frequency bands, even though the overall histogram equalization is good. In the second stage, an equalization is done separately on the high frequency and the low frequency components of the above equalized cepstra. We refer to this approach as Sub-band Histogram Equalization (S-HEQ). The new set of features has better equalization of the sub-bands as well as the overall cepstral histogram. Recognition results show a relative improvement of 12% and 15% over conventional HEQ on Aurora-2 and Aurora4 databases respectively.

[1]  Stéphane Dupont,et al.  VTS residual noise compensation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Jeih-Weih Hung,et al.  Subband Feature Statistics Normalization Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition , 2009, IEEE Signal Processing Letters.

[3]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Richard M. Stern,et al.  Normalization of time-derivative parameters using histogram equalization , 2003, INTERSPEECH.

[5]  Hermann Ney,et al.  Histogram based normalization in the acoustic feature space , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[6]  Hermann Ney,et al.  Quantile based histogram equalization for online applications , 2002, INTERSPEECH.

[7]  Javier Ramírez,et al.  Cepstral domain segmental nonlinear feature transformations for robust speech recognition , 2004, IEEE Signal Processing Letters.

[8]  José L. Pérez-Córdoba,et al.  Histogram equalization of speech representation for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.