A discriminative filter bank model for speech recognition

This paper investigates the realization of a lter bank model that achieves minimum classiication error. A bank-of-lter feature extractor module is jointly optimized with the clas-siier's parameters so as to minimize the errors occurring at the back-end classiier, in the framework of Minimum Clas-siication Error /Generalized Probabilistic Descent Method (MCE/GPD). The method was rst applied to readjusting various parameters of lter banks linearly spaced on the Mel-scale for the Japanese vowel recognition task. Analysis of the feature extraction process shows how those parts of the spectrum that are relevant to discrimination are captured. Then the method was applied to a multi-speaker word recognition system, which resulted in an word error rate reduction of more than 20 %.

[1]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[2]  James R. Glass,et al.  A comparative study of signal representations and classification techniques for speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[4]  Shigeru Katagiri,et al.  Prototype-based minimum classification error/generalized probabilistic descent training for various speech units , 1994, Comput. Speech Lang..

[5]  Shigeru Katagiri,et al.  Prototype-based discriminative training for various speech units , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  E. Zwicker,et al.  Analytical expressions for critical‐band rate and critical bandwidth as a function of frequency , 1980 .

[7]  Alain Biem,et al.  Filter bank design based on discriminative feature extraction , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.