论文信息 - Using Polynomial Kernel Support Vector Machines for Speaker Verification

Using Polynomial Kernel Support Vector Machines for Speaker Verification

In this letter, we propose a discriminative modeling approach for the speaker verification problem that uses polynomial kernel support vector machines (PK-SVMs). The proposed approach is rooted in an equivalence relationship between the state-of-the-art probabilistic linear discriminant analysis (PLDA) and second degree polynomial kernel methods. We present two techniques for overcoming the memory and computational challenges that PK-SVMs pose. The first of these, a kernel evaluation simplification trick, eliminates the need to explicitly compute dot products for a huge number of training samples. The second technique makes use of the massively parallel processing power of modern graphical processing units. We performed experiments on the Phase I speaker verification track of the DARPA sponsored Robust Automatic Transcription of Speech (RATS) program. We found that, in the multi-session enrollment experiments, second degree PK-SVMs outperformed PLDA across all tasks in terms of the official evaluation metric, and third and fourth degree PK-SVMs provided a performance improvement over the second degree PK-SVMs. Furthermore, for the “30s-30s” task, a linear score combination between the PLDA and PK-SVM based systems provided 27% improvement relative to the PLDA baseline in terms of the official evaluation metric.

Jason W. Pelecanos | Sibel Yaman

[1] Kurt Keutzer,et al. Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[2] William M. Campbell,et al. Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[3] James H. Elder,et al. Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4] Lukás Burget,et al. Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Jason W. Pelecanos,et al. Unifying PLDA and polynomial kernel SVMS , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6] Mohamed Kamal Omar,et al. On the Use of Non-Linear Polynomial Kernel SVMs in Language Recognition , 2012, INTERSPEECH.

[7] Kevin Walker,et al. The RATS radio traffic collection system , 2012, Odyssey.

[8] Jason W. Pelecanos,et al. The IBM RATS phase II speaker recognition system: overview and analysis , 2013, INTERSPEECH.

[9] James R. Glass,et al. Cosine Similarity Scoring without Score Normalization Techniques , 2010, Odyssey.

[10] Niko Brümmer,et al. The speaker partitioning problem , 2010, Odyssey.

[12] J. Platt. Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[13] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[14] Pietro Laface,et al. Fast discriminative speaker verification in the i-vector space , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15] Federico Girosi,et al. An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[16] Nathan Srebro,et al. A GPU-tailored approach for training kernelized SVMs , 2011, KDD.

[17] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.

[18] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[19] Daniel Garcia-Romero,et al. Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[20] Patrick Kenny,et al. Mixture of PLDA Models in i-vector Space for Gender-Independent Speaker Recognition , 2011, INTERSPEECH.