IEEE Transactions on Pattern Analysis and Machine Intelligence

Nonparametric kernel methods are widely used and proven to be successful in many statistical learning problems. Well-known examples include the kernel density estimate (KDE) for density estimation and the support vector machine (SVM) for classification. We propose a kernel classifier that optimizes the L2 or integrated squared error (ISE) of a “difference of densities”. We focus on the Gaussian kernel, although the method applies to other kernels suitable for density estimation. Like a support vector machine (SVM), the classifier is sparse and results from solving a quadratic program. We provide statistical performance guarantees for the proposed L2 kernel classifier in the form of a finite sample oracle inequality, and strong consistency in the sense of both ISE and probability of error. A special case of our analysis applies to a previously introduced ISE-based method for kernel density estimation. For dimensionality greater than 15, the basic L2 kernel classifier performs poorly in practice. Thus, we extend the method through the introduction of a natural regularization parameter, which allows it to remain competitive with the SVM in high dimensions. Simulation results for both synthetic and real-world data are presented.

[1]  M. Wand,et al.  On nonparametric discrimination using density differences , 1988 .

[2]  Nello Cristianini,et al.  Enlarging the Margins in Perceptron Decision Trees , 2000, Machine Learning.

[3]  David W. Scott,et al.  Parametric Statistical Modeling by Minimum Integrated Square Error , 2001, Technometrics.

[4]  Helge J. Ritter,et al.  Discriminative Densities from Maximum Contrast Estimation , 2002, NIPS.

[5]  K. Chaloner,et al.  Bayesian analysis in statistics and econometrics : essays in honor of Arnold Zellner , 1996 .

[6]  Florentina Bunea,et al.  Sparse Density Estimation with l1 Penalties , 2007, COLT.

[7]  A. Paulson,et al.  The estimation of the parameters of the stable laws , 1975 .

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Chao He,et al.  Novelty detection employing an L2 optimal non-parametric density estimator , 2004, Pattern Recognit. Lett..

[10]  JooSeuk Kim,et al.  Kernel Classification via Integrated Squared Error , 2007, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing.

[11]  Johan A. K. Suykens,et al.  A Risk Minimization Principle for a Class of Parzen Estimators , 2007, NIPS.

[12]  J. Chergui The integrated squared error estimation of parameters , 1996 .

[13]  Charles C. Taylor,et al.  Kernel density classification and boosting: an L2 analysis , 2005, Stat. Comput..

[14]  H. Kile,et al.  Bandwidth Selection in Kernel Density Estimation , 2010 .

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[16]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[17]  T. Wagner,et al.  Asymptotically optimal discriminant functions for pattern classification , 1969, IEEE Trans. Inf. Theory.

[18]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[19]  A. Tsybakov,et al.  Linear and convex aggregation of density estimators , 2006, math/0605292.

[20]  Deniz Erdoğmuş,et al.  Towards a unification of information theoretic learning and kernel methods , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..

[21]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[22]  Clayton D. Scott,et al.  Performance analysis for L_2 kernel classification , 2008, NIPS.

[23]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[24]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[25]  David J. Crisp,et al.  A Geometric Interpretation of ?-SVM Classifiers , 1999, NIPS 2000.

[26]  Chao He,et al.  Probability Density Estimation from Optimally Condensed Data Samples , 2003, IEEE Trans. Pattern Anal. Mach. Intell..