Performance analysis for L_2 kernel classification

We provide statistical performance guarantees for a recently introduced kernel classifier that optimizes the $L_2$ or integrated squared error (ISE) of a difference of densities. The classifier is similar to a support vector machine (SVM) in that it is the solution of a quadratic program and yields a sparse classifier. Unlike SVMs, however, the $L_2$ kernel classifier does not involve a regularization parameter. We prove a distribution free concentration inequality for a cross-validation based estimate of the ISE, and apply this result to deduce an oracle inequality and consistency of the classifier on the sense of both ISE and probability of error. Our results can also be specialized to give performance guarantees for an existing method of $L_2$ kernel density estimation.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  H. Kile,et al.  Bandwidth Selection in Kernel Density Estimation , 2010 .

[3]  M. Wand,et al.  On nonparametric discrimination using density differences , 1988 .

[4]  Deniz Erdoğmuş,et al.  Towards a unification of information theoretic learning and kernel methods , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..

[5]  Florentina Bunea,et al.  Sparse Density Estimation with l1 Penalties , 2007, COLT.

[6]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[7]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[8]  A. Tsybakov,et al.  Linear and convex aggregation of density estimators , 2006, math/0605292.

[9]  T. Wagner,et al.  Asymptotically optimal discriminant functions for pattern classification , 1969, IEEE Trans. Inf. Theory.

[10]  David W. Scott,et al.  Parametric Statistical Modeling by Minimum Integrated Square Error , 2001, Technometrics.

[11]  Chao He,et al.  Probability Density Estimation from Optimally Condensed Data Samples , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[13]  JooSeuk Kim,et al.  Kernel Classification via Integrated Squared Error , 2007, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing.

[14]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[15]  Helge J. Ritter,et al.  Discriminative Densities from Maximum Contrast Estimation , 2002, NIPS.

[16]  Charles C. Taylor,et al.  Kernel density classification and boosting: an L2 analysis , 2005, Stat. Comput..