A fast algorithm for training support vector regression via smoothed primal function minimization

The support vector regression (SVR) model is usually fitted by solving a quadratic programming problem, which is computationally expensive. To improve the computational efficiency, we propose to directly minimize the objective function in the primal form. However, the loss function used by SVR is not differentiable, which prevents the well-developed gradient based optimization methods from being applicable. As such, we introduce a smooth function to approximate the original loss function in the primal form of SVR, which transforms the original quadratic programming into a convex unconstrained minimization problem. The properties of the proposed smoothed objective function are discussed and we prove that the solution of the smoothly approximated model converges to the original SVR solution. A conjugate gradient algorithm is designed for minimizing the proposed smoothly approximated objective function in a sequential minimization manner. Extensive experiments on real-world datasets show that, compared to the quadratic programming based SVR, the proposed approach can achieve similar prediction accuracy with significantly improved computational efficiency, specifically, it is hundreds of times faster for linear SVR model and multiple times faster for nonlinear SVR model.

[1]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[2]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[3]  Olvi L. Mangasarian,et al.  A class of smoothing functions for nonlinear and mixed complementarity problems , 1996, Comput. Optim. Appl..

[4]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[5]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[6]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[7]  Yuh-Jye Lee,et al.  SSVM: A Smooth Support Vector Machine for Classification , 2001, Comput. Optim. Appl..

[8]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[9]  I-Cheng Yeh,et al.  Modeling of strength of high-performance concrete using artificial neural networks , 1998 .

[10]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[11]  Songfeng Zheng,et al.  Gradient descent algorithms for quantile regression with smooth approximation , 2011, Int. J. Mach. Learn. Cybern..

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[15]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[16]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[17]  David R. Musicant,et al.  Large Scale Kernel Regression via Linear Programming , 2002, Machine Learning.

[18]  Yiming Yang,et al.  Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization , 2003, ICML.

[19]  David R. Musicant,et al.  Massive Support Vector Regression , 1999 .

[20]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[21]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[22]  S. Gunn Support Vector Machines for Classification and Regression , 1998 .

[23]  Chia-Hua Ho,et al.  Large-scale linear support vector regression , 2012, J. Mach. Learn. Res..

[24]  B. Schölkopf,et al.  Linear programs for automatic accuracy control in regression. , 1999 .

[25]  Rohan Shiloh Shah,et al.  Support Vector Machines for Classiflcation and Regression , 2007 .

[26]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[28]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[29]  Han Liu,et al.  Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery , 2009, ICML '09.

[30]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[31]  L. Armijo Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[32]  G. R. Walsh,et al.  Methods Of Optimization , 1976 .