Generalized Linear Kernels for One-Versus-All Classification: Application to Speaker Recognition

In this paper, we examine the problem of kernel selection for one-versus-all (OVA) classification of multiclass data with support vector machines (SVMs). We focus specifically on the problem of training what we refer to as generalized linear kernels - that is, kernels of the form, k(x1,x2) = xT 1Rx2 , where R is a positive semidefinite matrix. Our approach for training k(x1, x2) involves first constructing a set of upper bounds on the rates of false positives and false negatives at a given score threshold. Under various conditions, minimizing these bounds leads to the closed-form solution, R = W-1, where W is the expected within-class covariance matrix of the data. We tested various parameterizations of R, including a diagonal parameterization that simply performs per-feature variance normalization, on the 1-conversation training condition of the SRE-2003 and SRE-2004 speaker recognition tasks. In experiments on a state-of-the-art MLLR-SVM speaker recognition system (A. Stolcke et al., 2005), the parameterization, R = W-1 s, where Ws is a smoothed estimate of W, achieves relative reductions in the minimum decision cost function (DCF) of up to 22% below the results obtained when R does per-feature variance normalization