Using Polynomial Kernel Support Vector Machines for Speaker Verification

In this letter, we propose a discriminative modeling approach for the speaker verification problem that uses polynomial kernel support vector machines (PK-SVMs). The proposed approach is rooted in an equivalence relationship between the state-of-the-art probabilistic linear discriminant analysis (PLDA) and second degree polynomial kernel methods. We present two techniques for overcoming the memory and computational challenges that PK-SVMs pose. The first of these, a kernel evaluation simplification trick, eliminates the need to explicitly compute dot products for a huge number of training samples. The second technique makes use of the massively parallel processing power of modern graphical processing units. We performed experiments on the Phase I speaker verification track of the DARPA sponsored Robust Automatic Transcription of Speech (RATS) program. We found that, in the multi-session enrollment experiments, second degree PK-SVMs outperformed PLDA across all tasks in terms of the official evaluation metric, and third and fourth degree PK-SVMs provided a performance improvement over the second degree PK-SVMs. Furthermore, for the “30s-30s” task, a linear score combination between the PLDA and PK-SVM based systems provided 27% improvement relative to the PLDA baseline in terms of the official evaluation metric.

[1]  Kurt Keutzer,et al.  Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[2]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[3]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Lukás Burget,et al.  Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Jason W. Pelecanos,et al.  Unifying PLDA and polynomial kernel SVMS , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Mohamed Kamal Omar,et al.  On the Use of Non-Linear Polynomial Kernel SVMs in Language Recognition , 2012, INTERSPEECH.

[7]  Kevin Walker,et al.  The RATS radio traffic collection system , 2012, Odyssey.

[8]  Jason W. Pelecanos,et al.  The IBM RATS phase II speaker recognition system: overview and analysis , 2013, INTERSPEECH.

[9]  James R. Glass,et al.  Cosine Similarity Scoring without Score Normalization Techniques , 2010, Odyssey.

[10]  Niko Brümmer,et al.  The speaker partitioning problem , 2010, Odyssey.

[12]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Pietro Laface,et al.  Fast discriminative speaker verification in the i-vector space , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[16]  Nathan Srebro,et al.  A GPU-tailored approach for training kernelized SVMs , 2011, KDD.

[17]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[18]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[19]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[20]  Patrick Kenny,et al.  Mixture of PLDA Models in i-vector Space for Gender-Independent Speaker Recognition , 2011, INTERSPEECH.