Nonparametric Budgeted Stochastic Gradient Descent

One of the most challenging problems in kernel online learning is to bound the model size. Budgeted kernel online learning addresses this issue by bounding the model size to a predefined budget. However, determining an appropriate value for such predefined budget is arduous. In this paper, we propose the Nonparametric Budgeted Stochastic Gradient Descent that allows the model size to automatically grow with data in a principled way. We provide theoretical analysis to show that our framework is guaranteed to converge for a large collection of loss functions (e.g. Hinge, Logistic, L2, L1, and ε-insensitive) which enables the proposed algorithm to perform both classification and regression tasks without hurting the ideal convergence rate O ( 1 T ) of the standard Stochastic Gradient Descent. We validate our algorithm on the real-world datasets to consolidate the theoretical claims.

[1]  H. Robbins A Stochastic Approximation Method , 1951 .

[2]  M. Opper Sparse Online Gaussian Processes , 2008 .

[3]  Slobodan Vucetic,et al.  Online Passive-Aggressive Algorithms on a Budget , 2010, AISTATS.

[4]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[5]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[6]  Slobodan Vucetic,et al.  Twin Vector Machines for Online Learning on a Budget , 2009, SDM.

[7]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Budget , 2008, SIAM J. Comput..

[8]  Koby Crammer,et al.  Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[9]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[10]  Elad Hazan,et al.  An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[11]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[12]  Barbara Caputo,et al.  Bounded Kernel-Based Online Learning , 2009, J. Mach. Learn. Res..

[13]  Steven C. H. Hoi,et al.  Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning , 2012, ICML.

[14]  Steven C. H. Hoi,et al.  Large Scale Online Kernel Learning , 2016, J. Mach. Learn. Res..

[15]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[16]  Claudio Gentile,et al.  Tracking the best hyperplane with a simple budget Perceptron , 2006, Machine Learning.

[17]  Koby Crammer,et al.  Online Classification on a Budget , 2003, NIPS.

[18]  Jinfeng Yi,et al.  Online Kernel Learning with a Near Optimal Sparsity Bound , 2013, ICML.

[19]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[20]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Fixed Budget , 2005, NIPS.

[21]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[22]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[23]  Ohad Shamir,et al.  Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[24]  Mark W. Schmidt,et al.  A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , 2012, ArXiv.

[25]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[26]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[27]  K. Miller On the Inverse of the Sum of Matrices , 1981 .

[28]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..