Speaker recognition based on discriminative feature extraction - optimization of mel-cepstral features using second-order all-pass warping function

This paper describes a new framework for designing speaker recognition systems based on the discriminative feature extraction (DFE) method. We apply a mel-cepstral estimation technique to the feature extractor in a Gaussian mixture model (GMM)-based text-independent speaker identification system. The mel-cepstral estimation technique uses the second-order all-pass warping function for frequency transformation. We jointly optimize the frequency warping parameters of the feature extractor and the GMM parameters of the classifier based on a minimum classification error (MCE) criterion. Experimental results show that the frequency warped scale after optimization is different from traditional linear/mel scales; moreover, the proposed system outperforms conventional systems trained with the generalized probabilistic descent (GPD) method in which only the classifier is optimized.

[1]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[2]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Chi-Shi Liu A general framework of feature extraction: application to speaker recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  B. Juang,et al.  A study on minimum error discriminative training for speaker recognition , 1995 .

[5]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[6]  Biing-Hwang Juang,et al.  Discriminative feature extraction for speech recognition , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[7]  Biing-Hwang Juang,et al.  New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[8]  Aaron E. Rosenberg,et al.  Speaker verification using minimum verification error training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Keiichi Tokuda,et al.  An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.