Improved Autocorrelation-based Noise robust speech recognition using kernel-based cross correlation and overestimation parameters

This paper proposes a new algorithm to consider cross correlation between noise and clean speech signal when autocorrelation-based features have been used for robust speech recognition. Also, an overestimation parameter has been inserted in clean speech autocorrelation estimation. We have also adopted the normalization of mean and variance of energy and cepstral parameters as an extra means of further improving the speech recognition rate. We recently proposed a new approach for Autocorrelation-based Noise Subtraction (ANS). This method did not consider any possible cross correlation between noise and the clean speech signal. In this paper we have tried to consider this term during the estimation of clean speech signal autocorrelation. Our results on the Aurora2 corpus have shown that the recognition rate, when the cross correlation term is considered, is improved. Furthermore, taking into account the overestimation parameter further improves the results.

[1]  Hiroyuki Segi,et al.  Filter bank subtraction for robust speech recognition , 2002, INTERSPEECH.

[2]  Mohammad Mehdi Homayounpour,et al.  Robust Feature Extraction of Speech Via Noise Reduction in Autocorrelation Domain , 2006, MRCS.

[3]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[4]  Mohammad Mehdi Homayounpour,et al.  Features based on filtering and spectral peaks in autocorrelation domain for robust speech recognition , 2007, Comput. Speech Lang..

[5]  Hsiao-Chuan Wang,et al.  Robust features for noisy speech recognition based on temporal trajectory filtering of short-time autocorrelation sequences , 1999, Speech Commun..

[6]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[7]  Hervé Bourlard,et al.  Phase autocorrelation (PAC) derived robust speech features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  Biing-Hwang Juang,et al.  The short-time modified coherence representation and noisy speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[9]  Kuldip K. Paliwal,et al.  MFCC computation from magnitude spectrum of higher lag autocorrelation coefficients for robust speech recognition , 2004, INTERSPEECH.

[10]  Climent Nadeu,et al.  Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition , 1997, IEEE Trans. Speech Audio Process..

[11]  Nicholas W. D. Evans,et al.  An Assessment on the Fundamental Limitations of Spectral Subtraction , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.