Beating SGD: Learning SVMs in Sublinear Time

We present an optimization approach for linear SVMs based on a stochastic primal-dual approach, where the primal step is akin to an importance-weighted SGD, and the dual step is a stochastic update on the importance weights. This yields an optimization method with a sublinear dependence on the training set size, and the first method for learning linear SVMs with runtime less then the size of the training set required for learning!

[1]  Russell Greiner,et al.  Learning and Classifying Under Hard Budgets , 2005, ECML.

[2]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[3]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[4]  Éva Tardos,et al.  Fast Approximation Algorithms for Fractional Packing and Covering Problems , 1995, Math. Oper. Res..

[5]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[6]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[7]  David P. Woodruff,et al.  Sublinear Optimization for Machine Learning , 2010, FOCS.

[8]  Kun Deng,et al.  Bandit-Based Algorithms for Budgeted Learning , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[9]  S. Sathiya Keerthi,et al.  A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[10]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[12]  Elad Hazan The convex optimization approach to regret minimization , 2011 .

[13]  Ambuj Tewari,et al.  Smoothness, Low Noise and Fast Rates , 2010, NIPS.

[14]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[15]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[16]  Nathan Srebro,et al.  SVM optimization: inverse dependence on training set size , 2008, ICML '08.