Adaptive learning rates for support vector machines working on data with low intrinsic dimension

We derive improved regression and classification rates for support vector machines using Gaussian kernels under the assumption that the data has some low-dimensional intrinsic structure that is described by the box-counting dimension. Under some standard regularity assumptions for regression and classification we prove learning rates, in which the dimension of the ambient space is replaced by the box-counting dimension of the support of the data generating distribution. In the regression case our rates are minimax optimal, whereas in the classification case our rates are of the form of the best known. Furthermore, we show that a training validation approach for choosing the hyperparameters of an SVM in a data dependent way achieves the same rates adaptively, that is without any knowledge on the data generating distribution.

[1]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[2]  S. Krantz Fractal geometry , 1989 .

[3]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[4]  L. Nikolova,et al.  On ψ- interpolation spaces , 2009 .

[5]  C. Tsallis Entropy , 2022, Thermodynamic Weirdness.

[6]  Robert D. Nowak,et al.  Minimax-optimal classification with dyadic decision trees , 2006, IEEE Transactions on Information Theory.

[7]  J. Yorke,et al.  Dimension of chaotic attractors , 1982 .

[8]  H. Triebel Theory of Function Spaces III , 2008 .

[9]  Van Der Vaart,et al.  Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth , 2009, 0908.3556.

[10]  Ding-Xuan Zhou,et al.  Learning and approximation by Gaussians on Riemannian manifolds , 2009, Adv. Comput. Math..

[11]  P. Bickel,et al.  Local polynomial regression on unknown manifolds , 2007, 0708.0983.

[12]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[13]  Jöran Bergh,et al.  Interpolation Spaces: An Introduction , 2011 .

[14]  Ingo Steinwart,et al.  Improved Classification Rates for Localized SVMs , 2019, J. Mach. Learn. Res..

[15]  Lena Schwartz,et al.  Theory Of Function Spaces Ii , 2016 .

[16]  Ruth Urner,et al.  Probabilistic Lipschitzness A niceness assumption for deterministic labels , 2013 .

[17]  I. J. Schoenberg Metric spaces and completely monotone functions , 1938 .

[18]  S. D. Vito,et al.  CO, NO2 and NOx urban pollution monitoring with on-field calibrated electronic nose by automatic bayesian regularization , 2009 .

[19]  Ingo Steinwart,et al.  Learning rates for kernel-based expectile regression , 2018, Machine Learning.

[20]  Nigel Williams,et al.  STRANGE ATTRACTORS , 2019, Chaos and Dynamical Systems.

[21]  Ingo Steinwart,et al.  Estimating conditional quantiles with the help of the pinball loss , 2011, 1102.2101.

[22]  Ingo Steinwart,et al.  Fast rates for support vector machines using Gaussian kernels , 2007, 0708.1838.

[23]  Mark S. C. Reed,et al.  Method of Modern Mathematical Physics , 1972 .

[24]  OLIVlER 13OUSQUET,et al.  NEW APPROACHES TO STATISTICAL LEARNING THEORY , 2006 .

[25]  Kellen Petersen August Real Analysis , 2009 .

[26]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[27]  B. Carl,et al.  Entropy, Compactness and the Approximation of Operators , 1990 .

[28]  L. Hörmander The analysis of linear partial differential operators , 1990 .

[29]  Sanjoy Dasgupta,et al.  A tree-based regressor that adapts to intrinsic dimension , 2012, J. Comput. Syst. Sci..

[30]  Samory Kpotufe,et al.  k-NN Regression Adapts to Local Intrinsic Dimension , 2011, NIPS.

[31]  M. Spivak A comprehensive introduction to differential geometry , 1979 .

[32]  Sanjeev R. Kulkarni,et al.  Rates of convergence of nearest neighbor estimation under arbitrary sampling , 1995, IEEE Trans. Inf. Theory.

[33]  Ding-Xuan Zhou,et al.  SVM LEARNING AND Lp APPROXIMATION BY GAUSSIANS ON RIEMANNIAN MANIFOLDS , 2009 .

[34]  A. Tsybakov,et al.  Fast learning rates for plug-in classifiers , 2007, 0708.2321.

[35]  Vikas K. Garg,et al.  Adaptivity to Local Smoothness and Dimension in Kernel Regression , 2013, NIPS.

[36]  D. Dunson,et al.  Bayesian Manifold Regression , 2013, 1305.0617.

[37]  Dino Sejdinovic,et al.  Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences , 2018, ArXiv.

[38]  Barry Simon,et al.  Methods of modern mathematical physics. III. Scattering theory , 1979 .

[39]  Ingo Steinwart,et al.  Optimal regression rates for SVMs using Gaussian kernels , 2013 .

[40]  Thomas Kühn,et al.  Covering numbers of Gaussian reproducing kernel Hilbert spaces , 2011, J. Complex..

[41]  Mark J. McGuinness,et al.  The fractal dimension of the Lorenz attractor , 1983 .

[42]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[43]  J. Milnor On the concept of attractor , 1985 .

[44]  Ingo Steinwart,et al.  Optimal Learning with Anisotropic Gaussian SVMs , 2018, ArXiv.

[45]  Ingo Steinwart,et al.  Improved Classification Rates under Refined Margin Conditions , 2016, 1610.09109.

[46]  Sanjoy Dasgupta,et al.  Random projection trees and low dimensional manifolds , 2008, STOC.