Sub-band based Log-energy and Its Dynamic Range Stretching for Robust In-car Speech Recognition

Abstract Log energy and its delta parameters, typically derived fromfull-band spectrum, are commonly used in automatic speechrecognition (ASR) systems. In this paper, we address the prob-lem of estimating log energy in the presence of backgroundnoise (usually resulting in a reduction in dynamic ranges ofspectral energies). We theoretically show that the backgroundnoise affects the trajectories of the OconventionalO log energyand its delta parameters, resulting in very poor estimation ofthe actual log energy and its delta parameters, which no longerdescribe the speech signal. We thus propose to estimate log en-ergy from the sub-band spectrum, followed by a dynamic rangestretching. Basedonspeechrecognitionexperimentsconductedon CENSREC-2 in-car database, the proposed log energy (andits corresponding delta parameters) is shown to perform verywell, resulting in an average relative improvement of 27.2%compared with the baseline front-ends. Moreover, it is alsoshown that further improvement can be achieved by incorpo-rating those new MFCCs obtained through non-linear spectralcontrast stretching.

[1]  Jay G. Wilpon,et al.  Discriminative analysis for feature reduction in automatic speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Hynek Hermansky,et al.  Nonlinear spectral transformations for robust speech recognition , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[3]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[4]  Satoshi Nakamura,et al.  CENSREC2: corpus and evaluation environments for in car continuous digit speech recognition , 2006, INTERSPEECH.

[5]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[6]  Weifeng Li,et al.  Non-linear spectral contrast stretching for in-car speech recognition , 2007, INTERSPEECH.

[7]  D. Pisoni,et al.  The Handbook of Speech Perception , 2004 .

[8]  Hermann Ney,et al.  Quantile based histogram equalization for noise robust speech recognition , 2001, INTERSPEECH.

[9]  Q. Summerfield,et al.  Auditory enhancement of changes in spectral amplitude. , 1987, The Journal of the Acoustical Society of America.

[10]  R. Fay,et al.  Speech Processing in the Auditory System , 2010, Springer Handbook of Auditory Research.