Approximate Stochastic Subgradient Estimation Training for Support Vector Machines

Subgradient algorithms for training support vector machines have been quite successful for solving large-scale and online learning problems. However, they have been restricted to linear kernels and strongly convex formulations. This paper describes efficient subgradient approaches without such limitations. Our approaches make use of randomized low-dimensional approximations to nonlinear kernels, and minimization of a reduced primal formulation using an algorithm based on robust stochastic approximation, which do not require strong convexity. Experiments illustrate that our approaches produce solutions of comparable prediction accuracy with the solutions acquired from existing SVM solvers, but often in much shorter time. We also suggest efficient prediction schemes that depend only on the dimension of kernel approximation, not on the number of support vectors.

[1]  C. Siegel,et al.  Iteration of Analytic Functions , 1942 .

[2]  A. Kolmogorov On conservation of conditionally periodic motions for a small change in Hamilton's function , 1954 .

[3]  R. Barrar Convergence of the von Zeipel procedure , 1970 .

[4]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[5]  A. Neishtadt Estimates in the kolmogorov theorem on conservation of conditionally periodic motions , 1981 .

[6]  J. Pöschel Integrability of hamiltonian systems on cantor sets , 1982 .

[7]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[8]  G. Benettin,et al.  A proof of Kolmogorov’s theorem on invariant tori using canonical transformations defined by the Lie method , 1984 .

[9]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[10]  A. Jorba,et al.  On the normal behaviour of partially elliptic lower-dimensional tori of Hamiltonian systems , 1997 .

[11]  A. Celletti,et al.  On the Stability of Realistic Three-Body Problems , 1997 .

[12]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[13]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[14]  J. Hubbard,et al.  A proof of Kolmogorov's theorem , 2003 .

[15]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[16]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[17]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[18]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[19]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[20]  Luca Zanni,et al.  Gradient projection methods for quadratic programs and applications in training support vector machines , 2005, Optim. Methods Softw..

[21]  R. Llave,et al.  KAM theory without action-angle variables , 2005 .

[22]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[23]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[24]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[25]  Sören Sonnenburg,et al.  Optimized cutting plane algorithm for support vector machines , 2008, ICML '08.

[26]  Thorsten Joachims,et al.  Sparse kernel SVMs via cutting-plane training , 2009, Machine Learning.

[27]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[28]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[29]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[30]  I. Song,et al.  Working Set Selection Using Second Order Information for Training Svm, " Complexity-reduced Scheme for Feature Extraction with Linear Discriminant Analysis , 2022 .