Robust features for speech recognition using minimum variance distortionless response (MVDR) spectrum estimation and feature normalization techniques

In this paper, feature extraction methods based on frequency-warped minimum variance distortionless response (MVDR) spectrum estimation are analyzed and tested. The effectiveness of the conventional FFT-based mel-frequency cepstrum coefficients (MFCC) and the MVDR-based features are carefully compared. Two normalization techniques are further applied to improve the robustness of the features: the widely used cepstral normalization (CN), and newly proposed progressive histogram equalization (PHEQ). Extensive experiments with respect to the AURORA2 database were performed. The results indicated that both the MVDR-based features and the normalization processes are very helpful.

[1]  S. Dharanipragada,et al.  Feature extraction for robust speech recognition , 2002, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).

[2]  B. Rao,et al.  All-pole model parameter estimation for voiced speech , 1997, 1997 IEEE Workshop on Speech Coding for Telecommunications Proceedings. Back to Basics: Attacking Fundamental Problems in Speech Coding.

[3]  A. Waibel,et al.  Warping and scaling of the minimum variance distortionless response , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[4]  Lin-Shan Lee,et al.  A new feature extraction front-end for robust speech recognition using progressive histogram equalization and multi-eigenvector temporal filtering , 2004, INTERSPEECH.

[5]  H. Strube Linear prediction on a warped frequency scale , 1980 .

[6]  J. Capon High-resolution frequency-wavenumber spectrum analysis , 1969 .

[7]  Lin-Shan Lee,et al.  Voice access of global information for broad-band wireless: technologies of today and challenges of tomorrow , 2001, Proc. IEEE.

[8]  Shang-nien Tsai Improved robustness of time-frequency principal components (TFPC) by synergy of methods in different domains , 2004, INTERSPEECH.

[9]  B. Porat,et al.  Digital Spectral Analysis with Applications. , 1988 .

[10]  Alexander H. Waibel,et al.  Minimum variance distortionless response on a warped frequency scale , 2003, INTERSPEECH.

[11]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[12]  R. Lacoss DATA ADAPTIVE SPECTRAL ANALYSIS METHODS , 1971 .

[13]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[14]  Bhaskar D. Rao,et al.  All-pole modeling of speech based on the minimum variance distortionless response spectrum , 2000, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[15]  Bhaskar D. Rao,et al.  MVDR based feature extraction for robust speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[16]  Unto K. Laine,et al.  A comparison of warped and conventional linear predictive coding , 2001, IEEE Trans. Speech Audio Process..

[17]  Bruce R. Musicus Fast MLM power spectrum estimation from uniformly spaced correlations , 1985, IEEE Trans. Acoust. Speech Signal Process..