论文信息 - IEEE Transactions on Pattern Analysis and Machine Intelligence

IEEE Transactions on Pattern Analysis and Machine Intelligence

Nonparametric kernel methods are widely used and proven to be successful in many statistical learning problems. Well-known examples include the kernel density estimate (KDE) for density estimation and the support vector machine (SVM) for classification. We propose a kernel classifier that optimizes the L2 or integrated squared error (ISE) of a “difference of densities”. We focus on the Gaussian kernel, although the method applies to other kernels suitable for density estimation. Like a support vector machine (SVM), the classifier is sparse and results from solving a quadratic program. We provide statistical performance guarantees for the proposed L2 kernel classifier in the form of a finite sample oracle inequality, and strong consistency in the sense of both ISE and probability of error. A special case of our analysis applies to a previously introduced ISE-based method for kernel density estimation. For dimensionality greater than 15, the basic L2 kernel classifier performs poorly in practice. Thus, we extend the method through the introduction of a natural regularization parameter, which allows it to remain competitive with the SVM in high dimensions. Simulation results for both synthetic and real-world data are presented.

C. Scott | JooSeuk Kim

[1] M. Wand,et al. On nonparametric discrimination using density differences , 1988 .

[2] Nello Cristianini,et al. Enlarging the Margins in Perceptron Decision Trees , 2000, Machine Learning.

[3] David W. Scott,et al. Parametric Statistical Modeling by Minimum Integrated Square Error , 2001, Technometrics.

[4] Helge J. Ritter,et al. Discriminative Densities from Maximum Contrast Estimation , 2002, NIPS.

[5] K. Chaloner,et al. Bayesian analysis in statistics and econometrics : essays in honor of Arnold Zellner , 1996 .

[6] Florentina Bunea,et al. Sparse Density Estimation with l1 Penalties , 2007, COLT.

[7] A. Paulson,et al. The estimation of the parameters of the stable laws , 1975 .

[8] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[9] Chao He,et al. Novelty detection employing an L2 optimal non-parametric density estimator , 2004, Pattern Recognit. Lett..

[10] JooSeuk Kim,et al. Kernel Classification via Integrated Squared Error , 2007, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing.

[11] Johan A. K. Suykens,et al. A Risk Minimization Principle for a Class of Parzen Estimators , 2007, NIPS.

[12] J. Chergui. The integrated squared error estimation of parameters , 1996 .

[13] Charles C. Taylor,et al. Kernel density classification and boosting: an L2 analysis , 2005, Stat. Comput..

[14] H. Kile,et al. Bandwidth Selection in Kernel Density Estimation , 2010 .

[15] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[16] J. Platt. Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[17] T. Wagner,et al. Asymptotically optimal discriminant functions for pattern classification , 1969, IEEE Trans. Inf. Theory.

[18] J. Shewchuk. An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[19] A. Tsybakov,et al. Linear and convex aggregation of density estimators , 2006, math/0605292.

[20] Deniz Erdoğmuş,et al. Towards a unification of information theoretic learning and kernel methods , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..

[21] Bernhard Schölkopf,et al. Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[22] Clayton D. Scott,et al. Performance analysis for L_2 kernel classification , 2008, NIPS.

[23] Johan A. K. Suykens,et al. Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[24] Alexander J. Smola,et al. Learning with kernels , 1998 .

[25] David J. Crisp,et al. A Geometric Interpretation of ?-SVM Classifiers , 1999, NIPS 2000.

[26] Chao He,et al. Probability Density Estimation from Optimally Condensed Data Samples , 2003, IEEE Trans. Pattern Anal. Mach. Intell..