Multiple kernel learning for speaker verification

Many speaker verification (SV) systems combine multiple classifiers using score-fusion to improve system performance. For SVM classifiers, an alternative strategy is to combine at the kernel level. This involves finding a suitable kernel weighting, known as multiple kernel learning (MKL). Recently, an efficient maximum-margin scheme for MKL has been proposed. This work examines several refinements to this scheme for SV. The standard scheme has a known tendency towards sparse weightings, which may not be optimal for SV. A regularisation term is proposed, allowing the appropriate level of sparsity to be selected. Cross-speaker tying of kernel weights is also applied to improve robustness. Various combinations of dynamic kernels were evaluated, including derivative and parametric kernels based upon different model structures. The performance achieved on the NIST 2002 SRE when combining five kernels was 4.83% EER.

[1]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[2]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[3]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[4]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[5]  Steve Renals,et al.  Speaker verification using sequence discriminant support vector machines , 2005, IEEE Transactions on Speech and Audio Processing.

[6]  Gunnar Rätsch,et al.  A General and Efficient Multiple Kernel Learning Algorithm , 2005, NIPS.

[7]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Mjf Gales,et al.  Parametric and derivative kernels for speaker verification , 2007 .

[9]  Mark J. F. Gales,et al.  Derivative and parametric kernels for speaker verification , 2007, INTERSPEECH.

[10]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[11]  Douglas E. Sturim,et al.  The MIT-LL/IBM 2006 Speaker Recognition System: High-Performance Reduced-Complexity Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Mark J. F. Gales,et al.  Combining Derivative and Parametric Kernels for Speaker Verification , 2009, IEEE Transactions on Audio, Speech, and Language Processing.