Dual SVM Training on a Budget

We present a dual subspace ascent algorithm for support vector machine training that respects a budget constraint limiting the number of support vectors. Budget methods are effective for reducing the training time of kernel SVM while retaining high accuracy. To date, budget training is available only for primal (SGD-based) solvers. Dual subspace ascent methods like sequential minimal optimization are attractive for their good adaptation to the problem structure, their fast convergence rate, and their practical speed. By incorporating a budget constraint into a dual algorithm, our method enjoys the best of both worlds. We demonstrate considerable speed-ups over primal budget training methods.

[1]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[2]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[3]  Peter K. Allen,et al.  An SVM learning approach to robotic grasping , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[4]  Stephan K. Chalup,et al.  Techniques for Improving Vision and Locomotion on the Sony AIBO Robot , 2003 .

[5]  Rong Jin,et al.  Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[6]  Hans Ulrich Simon,et al.  General Polynomial Time Decomposition Algorithms , 2005, J. Mach. Learn. Res..

[7]  Yoram Singer,et al.  Support Vector Machines on a Budget , 2006, NIPS.

[8]  Steven C. H. Hoi,et al.  Sparse Passive-Aggressive Learning for Bounded Online Kernel Methods , 2018, ACM Trans. Intell. Syst. Technol..

[9]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[10]  Shigeo Abe,et al.  Support Vector Machines for Pattern Classification (Advances in Pattern Recognition) , 2005 .

[11]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[12]  Trung Le,et al.  Large-scale Online Kernel Learning with Random Feature Reparameterization , 2017, IJCAI.

[13]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[14]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines , 2003, J. Mach. Learn. Res..

[15]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[16]  Barbara Caputo,et al.  Bounded Kernel-Based Online Learning , 2009, J. Mach. Learn. Res..

[17]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[18]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Budget , 2008, SIAM J. Comput..

[19]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[20]  Narendra Ahuja,et al.  Online learning with kernels: Overcoming the growing sum problem , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[21]  Slobodan Vucetic,et al.  Online Passive-Aggressive Algorithms on a Budget , 2010, AISTATS.

[22]  Chih-Jen Lin,et al.  On the convergence of the decomposition method for support vector machines , 2001, IEEE Trans. Neural Networks.

[23]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[24]  Trung Le,et al.  Dual Space Gradient Descent for Online Learning , 2016, NIPS.

[25]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[26]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[27]  Bingsheng He,et al.  ThunderSVM: A Fast SVM Library on GPUs and CPUs , 2018, J. Mach. Learn. Res..

[28]  William Stafiord Noble,et al.  Support vector machine applications in computational biology , 2004 .

[29]  Albert Y Xue,et al.  A support vector machine model provides an accurate transcript-level-based diagnostic for major depressive disorder , 2016, Translational Psychiatry.

[30]  Daniele Calandriello,et al.  Efficient Second-Order Online Kernel Learning with Adaptive Embedding , 2017, NIPS.

[31]  Hyeran Byun,et al.  Applications of Support Vector Machines for Pattern Recognition: A Survey , 2002, SVM.

[32]  Steven C. H. Hoi,et al.  Large Scale Online Kernel Learning , 2016, J. Mach. Learn. Res..

[33]  H. Kim,et al.  Application of Support Vector Machine for Prediction of Medication Adherence in Heart Failure Patients , 2010, Healthcare informatics research.

[34]  T. Glasmachers Finite Sum Acceleration vs . Adaptive Learning Rates for the Training of Kernel Machines on a Budget , 2016 .

[35]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[36]  Don R. Hush,et al.  Training SVMs Without Offset , 2011, J. Mach. Learn. Res..

[37]  Koby Crammer,et al.  Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[38]  J. Weston,et al.  Support Vector Machine Solvers , 2007 .

[39]  Li Cunhe,et al.  An Improved Training Algorithm of Support Vector Machines Based on Three Data Points Iteration , 2008, 2008 International Conference on Computer Science and Information Technology.

[40]  Steven C. H. Hoi,et al.  Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning , 2012, ICML.

[41]  Claudio Gentile,et al.  Tracking the best hyperplane with a simple budget Perceptron , 2006, Machine Learning.