SVM optimization: inverse dependence on training set size

We discuss how the runtime of SVM optimization should decrease as the size of the training data increases. We present theoretical and empirical results demonstrating how a simple subgradient descent approach indeed displays such behavior, at least for linear kernels.

[1]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[2]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[3]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[4]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[5]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[6]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[7]  Chih-Jen Lin,et al.  A formal analysis of stopping criteria of decomposition methods for support vector machines , 2002, IEEE Trans. Neural Networks.

[8]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[9]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[10]  J. Weston,et al.  Support Vector Machine Solvers , 2007 .

[11]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[12]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[13]  Andrew McCallum,et al.  Piecewise pseudolikelihood for efficient training of conditional random fields , 2007, ICML '07.

[14]  Alexander J. Smola,et al.  Bundle Methods for Machine Learning , 2007, NIPS.

[15]  S. Shalev-Shwartz,et al.  Fast Convergence Rates for Excess Regularized Risk with Application to SVM , 2008 .