Exact high-dimensional asymptotics for support vector machine

Support vector machine (SVM) is one of the most widely used classification methods. In this paper, we consider soft margin support vector machine used on data points with independent features, where the sample size $n$ and the feature dimension $p$ grows to $\infty$ in a fixed ratio $p/n\rightarrow \delta$. We propose a set of equations that exactly characterizes the asymptotic behavior of support vector machine. In particular, we give exact formula for (1) the variability of the optimal coefficients, (2) proportion of data points lying on the margin boundary (i.e. number of support vectors), (3) the final objective function value, and (4) expected misclassification error on new data points, which in particular implies exact formula for the optimal tuning parameter given a data generating mechanism. The global null case is considered first, where the label $y\in\{+1,-1\}$ is independent of the feature $x$. Then the signaled case is considered, where the label $y\in\{+1,-1\}$ is allowed to have a general dependence on the feature $x$ through a linear combination $a_0^Tx$. These results for the non-smooth hinge loss serve as an analogue to the recent results in \citet{sur2018modern} for smooth logistic loss. Our approach is based on heuristic leave-one-out calculations.

[1]  Noureddine El Karoui,et al.  Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators : rigorous results , 2013, 1311.2445.

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  V. Vapnik The Support Vector Method of Function Estimation , 1998 .

[4]  Andrea Montanari,et al.  The dynamics of message passing on dense graphs, with applications to compressed sensing , 2010, 2010 IEEE International Symposium on Information Theory.

[5]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[6]  Andrea Montanari,et al.  The Noise-Sensitivity Phase Transition in Compressed Sensing , 2010, IEEE Transactions on Information Theory.

[7]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[8]  P. Bickel,et al.  On robust regression with high-dimensional predictors , 2013, Proceedings of the National Academy of Sciences.

[9]  P. Bickel,et al.  Optimal M-estimation in high-dimensional regression , 2013, Proceedings of the National Academy of Sciences.

[10]  Andrea Montanari,et al.  High dimensional robust M-estimation: asymptotic variance via approximate message passing , 2013, Probability Theory and Related Fields.

[11]  Andrea Montanari,et al.  Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.

[12]  E. Candès,et al.  A modern maximum-likelihood theory for high-dimensional logistic regression , 2018, Proceedings of the National Academy of Sciences.

[13]  Adel Javanmard,et al.  State Evolution for General Approximate Message Passing Algorithms, with Applications to Spatial Coupling , 2012, ArXiv.

[14]  Hanwen Huang,et al.  Asymptotic behavior of Support Vector Machine for spiked population model , 2017, J. Mach. Learn. Res..

[15]  Yuxin Chen,et al.  The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled Chi-square , 2017, Probability Theory and Related Fields.