PCA-based human auditory filter bank for speech recognition

Although Mel-frequency Cepstral Coefficients (MFCC) has been proven to perform very well under most conditions, some limited efforts have been made in optimizing the shape of the filters in the filter-bank. In addition, MFCC does not approximate the critical bandwidth of the human auditory system. This paper presents a new feature extraction approach that (1) decouples filter bandwidth from other filter bank parameters inspired by the critical bands of the human auditory system and (2) designs the shape of the filters in the filter-bank. In this new approach, determining filter bandwidth is based on the approximation of critical band equivalent rectangular and the filter-bank coefficients are data-driven obtained by applying the principal component analysis (PCA) on the FFT spectrum of the training data. Though the experiments, we proved the noise robustness of this approach and the better performance of recognition systems.

[1]  Louis C. W. Pols,et al.  Spectral analysis and identification of Dutch vowels in monosyllabic words , 1977 .

[2]  Hermann Ney,et al.  Continuous mixture densities and linear discriminant analysis for improved context-dependent acoustic models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  John G. Harris,et al.  Improving the filter bank of a classic speech feature extraction algorithm , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[4]  Steve Young,et al.  The HTK book , 1995 .

[5]  John G. Harris,et al.  Increased mfcc filter bandwidth for noise-robust phoneme recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Biing-Hwang Juang,et al.  An application of discriminative feature extraction to filter-bank-based speech recognition , 2001, IEEE Trans. Speech Audio Process..

[7]  George Saon,et al.  Minimum Bayes error feature selection , 2000, INTERSPEECH.

[8]  John G. Harris,et al.  Human factor cepstral coefficients , 2002 .

[9]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[10]  B. Moore,et al.  Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.

[11]  I. Jolliffe Principal Component Analysis , 2002 .

[12]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[13]  Oscar C. Au,et al.  Auditory spectrum based features (ASBF) for robust speech recognition , 2000, INTERSPEECH.

[14]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Dirk Van Compernolle,et al.  Optimal feature sub-space selection based on discriminant analysis , 1999, EUROSPEECH.