A semismooth Newton method for support vector classification and regression

Support vector machine is an important and fundamental technique in machine learning. In this paper, we apply a semismooth Newton method to solve two typical SVM models: the L2-loss SVC model and the $$\epsilon $$ϵ-L2-loss SVR model. The semismooth Newton method is widely used in optimization community. A common belief on the semismooth Newton method is its fast convergence rate as well as high computational complexity. Our contribution in this paper is that by exploring the sparse structure of the models, we significantly reduce the computational complexity, meanwhile keeping the quadratic convergence rate. Extensive numerical experiments demonstrate the outstanding performance of the semismooth Newton method, especially for problems with huge size of sample data (for news20.binary problem with 19,996 features and 1,355,191 samples, it only takes 3 s). In particular, for the $$\epsilon $$ϵ-L2-loss SVR model, the semismooth Newton method significantly outperforms the leading solvers including DCD and TRON.

[1]  Chih-Jen Lin,et al.  A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification , 2017, ACML.

[2]  Houduo Qi,et al.  A Semismooth Newton Method for the Nearest Euclidean Distance Matrix Problem , 2013, SIAM J. Matrix Anal. Appl..

[3]  Thomas Martinetz,et al.  Simple Method for High-Performance Digit Recognition Based on Sparse Coding , 2008, IEEE Transactions on Neural Networks.

[4]  Xinzhen Zhang,et al.  Real eigenvalues of nonsymmetric tensors , 2015, Computational Optimization and Applications.

[5]  Yuh-Jye Lee,et al.  SSVM: A Smooth Support Vector Machine for Classification , 2001, Comput. Optim. Appl..

[6]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[7]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[8]  Orizon Pereira Ferreira,et al.  On the global convergence of the inexact semi-smooth Newton method for absolute value equation , 2015, Comput. Optim. Appl..

[9]  Kim-Chuan Toh,et al.  Solving the OSCAR and SLOPE Models Using a Semismooth Newton-Based Augmented Lagrangian Method , 2018, J. Mach. Learn. Res..

[10]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[11]  J. Borwein,et al.  Two-Point Step Size Gradient Methods , 1988 .

[12]  Hisham Al-Mubaid,et al.  A New Text Categorization Technique Using Distributional Clustering and Learning Logic , 2006, IEEE Transactions on Knowledge and Data Engineering.

[13]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[14]  YuBo Yuan,et al.  A Polynomial Smooth Support Vector Machine for Classification , 2005, ADMA.

[15]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[16]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[17]  Jein-Shan Chen,et al.  Two smooth support vector machines for ε-insensitive regression , 2017 .

[18]  S. Sathiya Keerthi,et al.  A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[19]  Kim-Chuan Toh,et al.  A Highly Efficient Semismooth Newton Augmented Lagrangian Method for Solving Lasso Problems , 2016, SIAM J. Optim..

[20]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[21]  R. Mifflin Semismooth and Semiconvex Functions in Constrained Optimization , 1977 .

[22]  Defeng Sun,et al.  A Quadratically Convergent Newton Method for Computing the Nearest Correlation Matrix , 2006, SIAM J. Matrix Anal. Appl..

[23]  Kim-Chuan Toh,et al.  An Efficient Semismooth Newton Based Algorithm for Convex Clustering , 2018, ICML.

[24]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[25]  Fu Lin,et al.  A two-level approach to large mixed-integer programs with application to cogeneration in energy-efficient buildings , 2016, Comput. Optim. Appl..

[26]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[27]  Xinyuan Zhao,et al.  A SEMISMOOTH NEWTON-CG AUGMENTED LAGRANGIAN METHOD FOR LARGE SCALE LINEAR AND CONVEX QUADRATIC SDPS , 2009 .

[28]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[29]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[30]  Francisco Facchinei,et al.  The semismooth Newton method for the solution of quasi-variational inequalities , 2015, Comput. Optim. Appl..

[31]  Liqun Qi,et al.  A semismooth Newton method for tensor eigenvalue complementarity problem , 2016, Comput. Optim. Appl..

[32]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[33]  Jian Shen,et al.  A sequential majorization method for approximating weighted time series of finite rank , 2018 .

[34]  Olvi L. Mangasarian,et al.  A finite newton method for classification , 2002, Optim. Methods Softw..

[35]  D. Basak,et al.  Support Vector Regression , 2008 .

[36]  Kim-Hui Yap,et al.  Fuzzy SVM for content-based image retrieval: a pseudo-label support vector machine framework , 2006, IEEE Computational Intelligence Magazine.

[37]  F. Clarke Optimization And Nonsmooth Analysis , 1983 .

[38]  H. Robbins A Stochastic Approximation Method , 1951 .

[39]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[40]  Jein-Shan Chen,et al.  Two smooth support vector machines for $$\varepsilon $$ε-insensitive regression , 2018, Comput. Optim. Appl..

[41]  Shiqian Ma,et al.  Barzilai-Borwein Step Size for Stochastic Gradient Descent , 2016, NIPS.

[42]  Chia-Hua Ho,et al.  Large-scale linear support vector regression , 2012, J. Mach. Learn. Res..

[43]  Houduo Qi,et al.  A Sequential Semismooth Newton Method for the Nearest Low-rank Correlation Matrix Problem , 2011, SIAM J. Optim..

[44]  Liqun Qi,et al.  A nonsmooth version of Newton's method , 1993, Math. Program..

[45]  Li Zhang,et al.  On the sparseness of 1-norm support vector machines , 2010, Neural Networks.

[46]  Chih-Jen Lin,et al.  Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.