Optimizing feature complementarity by evolution strategy: Application to automatic speaker verification

Conventional automatic speaker verification systems are based on cepstral features like Mel-scale frequency cepstrum coefficient (MFCC), or linear predictive cepstrum coefficient (LPCC). Recent published works showed that the use of complementary features can significantly improve the system performances. In this paper, we propose to use an evolution strategy to optimize the complementarity of two filter bank based feature extractors. Experiments we made with a state of the art speaker verification system show that significant improvement can be obtained. Compared to the standard MFCC, an equal error rate (EER) improvement of 11.48% and 21.56% was obtained on the 2005 Nist SRE and Ntimit databases, respectively. Furthermore, the obtained filter banks picture out the importance of some specific spectral information for automatic speaker verification.

[1]  Douglas A. Reynolds,et al.  The SuperSID project: exploiting high-level information for high-accuracy speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Alvin F. Martin,et al.  NIST speaker recognition evaluation chronicles , 2004, Odyssey.

[3]  Mark Liberman,et al.  The Mixer and Transcript Reading Corpora: Resources for Multilingual, Crosschannel Speaker Recognition Research , 2006, LREC.

[4]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[5]  Douglas E. Sturim,et al.  The MIT-LL/IBM 2006 Speaker Recognition System: High-Performance Reduced-Complexity Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6]  Cyril Fonlupt,et al.  Exploring Overfitting in Genetic Programming , 2003, Artificial Evolution.

[7]  Biing-Hwang Juang,et al.  Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method , 1998, Proc. IEEE.

[8]  Pietro Laface,et al.  Channel Factors Compensation in Model and Feature Domain for Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[9]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[10]  Alvin F. Martin,et al.  NIST Speaker Recognition Evaluation Chronicles - Part 2 , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[11]  Richard J. Mammone,et al.  An analysis of data fusion methods for speaker verification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[12]  Keiichi Tokuda,et al.  A new approach to designing a feature extractor in speaker identification based on discriminative feature extraction , 2001, Speech Commun..

[13]  Samy Bengio,et al.  Spectral Subband Centroids as Complementary Features for Speaker Authentication , 2004, ICBA.

[14]  Marcos Faúndez-Zanuy,et al.  Non-linear Speech Feature Extraction for Phoneme Classification and Speaker Recognition , 2004, Summer School on Neural Networks.

[15]  Patrick Kenny,et al.  Experiments in Speaker Adaptation for Factor Analysis Based Speaker Verification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[16]  William M. Campbell,et al.  Fusing discriminative and generative methods for speaker recognition: experiments on switchboard and NFI/TNO field data , 2004, Odyssey.

[17]  Chin-Teng Lin,et al.  GA-based noisy speech recognition using two-dimensional cepstrum , 2000, IEEE Trans. Speech Audio Process..

[18]  Luis Javier Rodríguez-Fuentes,et al.  Feature Selection Based on Genetic Algorithms for Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[19]  Zhaohui Wu,et al.  Further feature extraction for speaker recognition , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[20]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[21]  Douglas A. Reynolds,et al.  An overview of automatic speaker recognition technology , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Taghi M. Khoshgoftaar,et al.  Reducing overfitting in genetic programming models for software quality classification , 2004, Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings..

[23]  Brian J. Ross The Effects of Randomly Sampled Training Data on Program Evolution , 2000, GECCO.

[24]  Hans-Paul Schwefel,et al.  Evolution strategies – A comprehensive introduction , 2002, Natural Computing.