Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning

Kernel-based online learning has often shown state-of-the-art performance for many online learning tasks. It, however, suffers from a major shortcoming, that is, the unbounded number of support vectors, making it nonscalable and unsuitable for applications with large-scale datasets. In this work, we study the problem of bounded kernel-based online learning that aims to constrain the number of support vectors by a predefined budget. Although several algorithms have been proposed in literature, they are neither computationally efficient due to their intensive budget maintenance strategy nor effective due to the use of simple Perceptron algorithm. To overcome these limitations, we propose a framework for bounded kernel-based online learning based on an online gradient descent approach. We propose two efficient algorithms of bounded online gradient descent (BOGD) for scalable kernel-based online learning: (i) BOGD by maintaining support vectors using uniform sampling, and (ii) BOGD++ by maintaining support vectors using nonuniform sampling. We present theoretical analysis of regret bound for both algorithms, and found promising empirical performance in terms of both efficacy and efficiency by comparing them to several well-known algorithms for bounded kernel-based online learning on large-scale datasets.

[1]  Rong Jin,et al.  Online Multiple Kernel Learning: Algorithms and Mistake Bounds , 2010, ALT.

[2]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[3]  Claudio Gentile,et al.  Tracking the best hyperplane with a simple budget Perceptron , 2006, Machine Learning.

[4]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Fixed Budget , 2005, NIPS.

[5]  Dale Schuurmans,et al.  implicit Online Learning with Kernels , 2006, NIPS.

[6]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[7]  Koby Crammer,et al.  Online Classification on a Budget , 2003, NIPS.

[8]  Jason Weston,et al.  Online (and Offline) on an Even Tighter Budget , 2005, AISTATS.

[9]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[10]  Massimiliano Pontil,et al.  Online Gradient Descent Learning Algorithms , 2008, Found. Comput. Math..

[11]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[12]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[13]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[14]  Rong Jin,et al.  Double Updating Online Learning , 2011, J. Mach. Learn. Res..

[15]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[16]  Barbara Caputo,et al.  The projectron: a bounded kernel-based Perceptron , 2008, ICML '08.