In this paper, we examine the problem of kernel selection for one-versus-all (OVA) classification of multiclass data with support vector machines (SVMs). We focus specifically on the problem of training what we refer to as generalized linear kernels - that is, kernels of the form, k(x1,x2) = xT 1Rx2 , where R is a positive semidefinite matrix. Our approach for training k(x1, x2) involves first constructing a set of upper bounds on the rates of false positives and false negatives at a given score threshold. Under various conditions, minimizing these bounds leads to the closed-form solution, R = W-1, where W is the expected within-class covariance matrix of the data. We tested various parameterizations of R, including a diagonal parameterization that simply performs per-feature variance normalization, on the 1-conversation training condition of the SRE-2003 and SRE-2004 speaker recognition tasks. In experiments on a state-of-the-art MLLR-SVM speaker recognition system (A. Stolcke et al., 2005), the parameterization, R = W-1 s, where Ws is a smoothed estimate of W, achieves relative reductions in the minimum decision cost function (DCF) of up to 22% below the results obtained when R does per-feature variance normalization
[1]
Alexander J. Smola,et al.
Neural Information Processing Systems
,
1997,
NIPS 1997.
[2]
Nello Cristianini,et al.
An Introduction to Support Vector Machines and Other Kernel-based Learning Methods
,
2000
.
[3]
Vladimir N. Vapnik,et al.
The Nature of Statistical Learning Theory
,
2000,
Statistics for Engineering and Information Science.
[4]
Yves Grandvalet,et al.
Adaptive Scaling for Feature Selection in SVMs
,
2002,
NIPS.
[5]
Alexander J. Smola,et al.
Hyperkernels
,
2002,
NIPS.
[6]
Nello Cristianini,et al.
Learning the Kernel Matrix with Semidefinite Programming
,
2002,
J. Mach. Learn. Res..
[7]
Michael I. Jordan,et al.
Multiple kernel learning, conic duality, and the SMO algorithm
,
2004,
ICML.
[8]
Sayan Mukherjee,et al.
Choosing Multiple Parameters for Support Vector Machines
,
2002,
Machine Learning.
[9]
Andreas Stolcke,et al.
MLLR transforms as features in speaker recognition
,
2005,
INTERSPEECH.
[10]
I. Ntroduction.
The NIST Year 2005 Speaker Recognition Evaluation Plan 1
,
.