The Contribution of Cepstral and Stylistic Features to SRI's 2005 NIST Speaker Recognition Evaluation System

Recent work in speaker recognition has demonstrated the advantage of modeling stylistic features in addition to traditional cepstral features, but to date there has been little study of the relative contributions of these different feature types to a state-of-the-art system. In this paper we provide such an analysis, based on SRI's submission to the NIST 2005 speaker recognition evaluation. The system consists of 7 subsystems (3 cepstral 4 stylistic). By running independent N-way subsystem combinations for increasing values of N, we fines that (1) a monotonic pattern in the choice of the best N systems allows for the inference of subsystem importance; (2) the ordering of subsystems alternates between cepstral and stylistic; (3) syllable-based prosodic features are the strongest stylistic features, and (4) overall subsystem ordering depends crucially on the amount of training data (1 versus 8 conversation sides). Improvements over the baseline cepstral system, when all systems are combined, range from 47% to 67%, with larger improvements for the 8-side condition. These results provide direct evidence of the complementary contributions of cepstral and stylistic features to speaker discrimination

[1]  Andreas Stolcke,et al.  SRI's 2004 NIST speaker recognition evaluation system , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Andreas Stolcke,et al.  Modeling duration patterns for speaker recognition , 2003, INTERSPEECH.

[3]  William M. Campbell,et al.  High-level speaker verification with support vector machines , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  George R. Doddington,et al.  Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[5]  Douglas A. Reynolds,et al.  The SuperSID project: exploiting high-level information for high-accuracy speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  Roland Auckenthaler,et al.  Improving a GMM speaker verification system by phonetic weighting , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7]  S.S. Kajarekar Four weightings and a fusion: a cepstral-SVM system for speaker recognition , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[8]  Sachin S. Kajarekar,et al.  Class-dependent score combination for speaker recognition , 2005, INTERSPEECH.

[9]  Andreas Stolcke,et al.  MLLR transforms as features in speaker recognition , 2005, INTERSPEECH.

[10]  Andreas Stolcke,et al.  Modeling prosodic feature sequences for speaker recognition , 2005, Speech Commun..

[11]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..