Four weightings and a fusion: a cepstral-SVM system for speaker recognition

A new speaker recognition system is described that uses Mel-frequency cepstral features. This system is a combination of four support vector machines (SVMs). All the SVM systems use polynomial features and they are trained and tested independently using a linear inner-product kernel. Scores from each system are combined with equal weight to generate the final score. We evaluate the combined SVM system using extensive development sets with diverse recording conditions. These sets include NIST 2003, 2004 and 2005 speaker recognition evaluation datasets, and FISHER data. The results show that for 1-side training, the combined SVM system gives comparable performance to a system using cepstral features with a Gaussian mixture model (baseline), and combination of the two systems improves the baseline performance. For 8-side training, the combined SVM system is able to take advantage of more data and gives a 29% improvement over the baseline system

[1]  Alvin F. Martin,et al.  Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004 , 2004, LREC.

[2]  Andreas Stolcke,et al.  Improved phonetic speaker recognition using lattice decoding , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  William M. Campbell,et al.  Generalized linear discriminant sequence kernels for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Elizabeth Shriberg,et al.  SVM modeling of "SNERF-grams" for speaker recognition , 2004, INTERSPEECH.

[5]  Roland Kuhn,et al.  Speaker identification and verification using eigenvoices , 2000, INTERSPEECH.

[6]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[7]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[8]  William M. Campbell,et al.  Channel compensation for SVM speaker recognition , 2004, Odyssey.

[9]  Parag A. Pathak,et al.  Massachusetts Institute of Technology , 1964, Nature.

[10]  Andreas Stolcke,et al.  MLLR transforms as features in speaker recognition , 2005, INTERSPEECH.

[11]  William M. Campbell,et al.  High-level speaker verification with support vector machines , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Roland Auckenthaler,et al.  Improving a GMM speaker verification system by phonetic weighting , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).