Online Kernel Learning with a Near Optimal Sparsity Bound

In this work, we focus on Online Sparse Kernel Learning that aims to online learn a kernel classifier with a bounded number of support vectors. Although many online learning algorithms have been proposed to learn a sparse kernel classifier, most of them fail to bound the number of support vectors used by the final solution which is the average of the intermediate kernel classifiers generated by online algorithms. The key idea of the proposed algorithm is to measure the difficulty in correctly classifying a training example by the derivative of a smooth loss function, and give a more chance to a difficult example to be a support vector than an easy one via a sampling scheme. Our analysis shows that when the loss function is smooth, the proposed algorithm yields similar performance guarantee as the standard online learning algorithm but with a near optimal number of support vectors (up to a poly(ln T) factor). Our empirical study shows promising performance of the proposed algorithm compared to the state-of-the-art algorithms for online sparse kernel learning.

[1]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[2]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[3]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[4]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[5]  Chun Chen,et al.  Efficient Online Learning for Large-scale Sparse Kernel Logistic Regression , 2022 .

[6]  Steven C. H. Hoi,et al.  Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning , 2012, ICML.

[7]  Bernhard Schölkopf,et al.  A Direct Method for Building Sparse Kernel Learning Algorithms , 2006, J. Mach. Learn. Res..

[8]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[9]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[10]  Claudio Gentile,et al.  Tracking the best hyperplane with a simple budget Perceptron , 2006, Machine Learning.

[11]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[12]  S. Sathiya Keerthi,et al.  Building Support Vector Machines with Reduced Classifier Complexity , 2006, J. Mach. Learn. Res..

[13]  Volker Roth,et al.  Probabilistic Discriminative Kernel Classifiers for Multi-class Problems , 2001, DAGM-Symposium.

[14]  Nathan Srebro,et al.  Learning Optimally Sparse Support Vector Machines , 2013, ICML.

[15]  Bernhard Schölkopf,et al.  Improving the accuracy and speed of support vector learning machines , 1997, NIPS 1997.

[16]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Budget , 2008, SIAM J. Comput..

[17]  Dale Schuurmans,et al.  implicit Online Learning with Kernels , 2006, NIPS.

[18]  Ambuj Tewari,et al.  Smoothness, Low Noise and Fast Rates , 2010, NIPS.

[19]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[20]  Koby Crammer,et al.  Online Classification on a Budget , 2003, NIPS.

[21]  Bernhard Schölkopf,et al.  Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Yi Liu,et al.  SemiBoost: Boosting for Semi-Supervised Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[25]  Koby Crammer,et al.  Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[26]  Barbara Caputo,et al.  The projectron: a bounded kernel-based Perceptron , 2008, ICML '08.

[27]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[28]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..