Phone-based Cepstral Polynomial SVM System for Speaker Recognition

We have been using a phone-based cepstral system with polynomial features in NIST evaluations for the past two years. This system uses three broad phone classes, three states per class, and third-order polynomial features obtained from MFCC features. In this paper, we present a complete analysis of the system. We start from a simpler system that does not use phones or states and show that the addition of phones gives a significant improvement. We show that adding state information does not provide improvement on its own but provides a significant improvement when used with phone classes. We complete the system by applying nuisance attribute projection (NAP) and score normalization. We show that splitting features after a joint NAP over all phone classes results in a significant improvement. Overall, we obtain about 25% performance improvement with polynomial features based on phones and states, and obtain a system with performance comparable to a state-of-the-art SVM system.

[1]  William M. Campbell,et al.  Generalized linear discriminant sequence kernels for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[3]  William M. Campbell,et al.  Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  Hynek Hermansky,et al.  Analysis of variability in speech with applications to speech and speaker recognition , 2002 .

[5]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[6]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[7]  S.S. Kajarekar Four weightings and a fusion: a cepstral-SVM system for speaker recognition , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[8]  Alvin F. Martin,et al.  NIST Speaker Recognition Evaluations Utilizing the Mixer Corpora—2004, 2005, 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.