Consistency of support vector machines using additive kernels for additive models

Support vector machines (SVMs) are special kernel based methods and have been among the most successful learning methods for more than a decade. SVMs can informally be described as kinds of regularized M-estimators for functions and have demonstrated their usefulness in many complicated real-life problems. During the last few years a great part of the statistical research on SVMs has concentrated on the question of how to design SVMs such that they are universally consistent and statistically robust for nonparametric classification or nonparametric regression purposes. In many applications, some qualitative prior knowledge of the distribution P or of the unknown function f to be estimated is present or a prediction function with good interpretability is desired, such that a semiparametric model or an additive model is of interest. The question of how to design SVMs by choosing the reproducing kernel Hilbert space (RKHS) or its corresponding kernel to obtain consistent and statistically robust estimators in additive models is addressed. An explicit construction of such RKHSs and their kernels, which will be called additive kernels, is given. SVMs based on additive kernels will be called additive support vector machines. The use of such additive kernels leads, in combination with a Lipschitz continuous loss function, to SVMs with the desired properties for additive models. Examples include quantile regression based on the pinball loss function, regression based on the @e-insensitive loss function, and classification based on the hinge loss function.

[1]  Matthias Hein,et al.  Hilbertian Metrics and Positive Definite Kernels on Probability Measures , 2005, AISTATS.

[2]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[3]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.

[4]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[5]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[6]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[7]  R. Dudley,et al.  Real Analysis and Probability: Complex Numbers, Vector Spaces, and Taylor's Theorem with Remainder , 2002 .

[8]  Robert Hable,et al.  On qualitative robustness of support vector machines , 2009, J. Multivar. Anal..

[9]  Luc Devroye,et al.  Any Discrimination Rule Can Have an Arbitrarily Bad Probability of Error for Finite Sample Size , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Gerhard Tutz,et al.  Boosting ridge regression , 2007, Comput. Stat. Data Anal..

[11]  Ding-Xuan Zhou,et al.  Learning Theory: An Approximation Theory Viewpoint , 2007 .

[12]  C. Wild,et al.  Vector Generalized Additive Models , 1996 .

[14]  Ingo Steinwart,et al.  Fast rates for support vector machines using Gaussian kernels , 2007, 0708.1838.

[15]  Don R. Hush,et al.  Optimal Rates for Regularized Least Squares Regression , 2009, COLT.

[16]  Arnout Van Messem,et al.  On consistency and robustness properties of support vector machines for heavy-tailed distributions , 2010 .

[17]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[18]  Simon N. Wood,et al.  Generalized Additive Models , 2006, Annual Review of Statistics and Its Application.

[19]  R. M. Dudley,et al.  Real Analysis and Probability , 1989 .

[20]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[21]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[22]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[23]  Andreas Christmann,et al.  Fast Learning from Non-i.i.d. Observations , 2009, NIPS.

[24]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[25]  Stephen M. Robinson,et al.  An Implicit-Function Theorem for a Class of Nonsmooth Functions , 1991, Math. Oper. Res..

[26]  Bernhard Schölkopf,et al.  Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions , 2009, NIPS.

[27]  Torsten Hothorn,et al.  Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression , 2011 .

[28]  Andreas Christmann,et al.  Bouligand Derivatives and Robustness of Support Vector Machines for Regression , 2007, J. Mach. Learn. Res..

[29]  Andreas Christmann,et al.  Universal Kernels on Non-Standard Input Spaces , 2010, NIPS.

[30]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[31]  Torsten Hothorn,et al.  Boosting additive models using component-wise P-Splines , 2008, Comput. Stat. Data Anal..

[32]  Gerhard Tutz,et al.  Boosting nonlinear additive autoregressive time series , 2009, Comput. Stat. Data Anal..

[33]  Di-Rong Chen,et al.  Learning rates of regularized regression for exponentially strongly mixing sequence , 2008 .

[34]  S. Mendelson,et al.  Regularization in kernel learning , 2010, 1001.2094.

[35]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[36]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[37]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[38]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[39]  Kristof Coussement,et al.  Ensemble classification based on generalized additive models , 2010, Comput. Stat. Data Anal..

[40]  Frédéric Ferraty,et al.  Additive prediction and boosting for functional data , 2009, Comput. Stat. Data Anal..

[41]  P. Rousseeuw,et al.  Alternatives to the Median Absolute Deviation , 1993 .

[42]  P. Massart,et al.  Statistical performance of support vector machines , 2008, 0804.0551.

[43]  John D. Lafferty,et al.  Diffusion Kernels on Statistical Manifolds , 2005, J. Mach. Learn. Res..