Sparse Passive-Aggressive Learning for Bounded Online Kernel Methods

One critical deficiency of traditional online kernel learning methods is their unbounded and growing number of support vectors in the online learning process, making them inefficient and non-scalable for large-scale applications. Recent studies on scalable online kernel learning have attempted to overcome this shortcoming, e.g., by imposing a constant budget on the number of support vectors. Although they attempt to bound the number of support vectors at each online learning iteration, most of them fail to bound the number of support vectors for the final output hypothesis, which is often obtained by averaging the series of hypotheses over all the iterations. In this article, we propose a novel framework for bounded online kernel methods, named “Sparse Passive-Aggressive (SPA)” learning, which is able to yield a final output kernel-based hypothesis with a bounded number of support vectors. Unlike the common budget maintenance strategy used by many existing budget online kernel learning approaches, the idea of our approach is to attain the bounded number of support vectors using an efficient stochastic sampling strategy that samples an incoming training example as a new support vector with a probability proportional to its loss suffered. We theoretically prove that SPA achieves an optimal mistake bound in expectation, and we empirically show that it outperforms various budget online kernel learning algorithms. Finally, in addition to general online kernel learning tasks, we also apply SPA to derive bounded online multiple-kernel learning algorithms, which can significantly improve the scalability of traditional Online Multiple-Kernel Classification (OMKC) algorithms while achieving satisfactory learning accuracy as compared with the existing unbounded OMKC algorithms.

[1]  Rong Jin,et al.  Online Multiple Kernel Classification , 2013, Machine Learning.

[2]  Yung C. Shin,et al.  Sparse Multiple Kernel Learning for Signal Processing Applications , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Koby Crammer,et al.  Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[5]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[6]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[7]  Slobodan Vucetic,et al.  Online Passive-Aggressive Algorithms on a Budget , 2010, AISTATS.

[8]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[9]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[10]  Colin Campbell,et al.  Kernel methods: a survey of current techniques , 2002, Neurocomputing.

[11]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[12]  Barbara Caputo,et al.  Multi Kernel Learning with Online-Batch Optimization , 2012, J. Mach. Learn. Res..

[13]  Slobodan Vucetic,et al.  Twin Vector Machines for Online Learning on a Budget , 2009, SDM.

[14]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[15]  Steven C. H. Hoi,et al.  Online Sparse Passive Aggressive Learning with Kernels , 2016, SDM.

[16]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[17]  Rong Jin,et al.  Double Updating Online Learning , 2011, J. Mach. Learn. Res..

[18]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[19]  Barbara Caputo,et al.  Bounded Kernel-Based Online Learning , 2009, J. Mach. Learn. Res..

[20]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Budget , 2008, SIAM J. Comput..

[21]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[22]  Ohad Shamir,et al.  Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.

[23]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[24]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[25]  Ning Chen,et al.  Mobile App Tagging , 2016, WSDM '16.

[26]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[27]  Koby Crammer,et al.  Online Classification on a Budget , 2003, NIPS.

[28]  Eric P. Xing,et al.  Online Learning of Structured Predictors with Multiple Kernels , 2011, AISTATS.

[29]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[30]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[31]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[32]  Francesco Orabona,et al.  OM-2: An online multi-class Multi-Kernel Learning algorithm Luo Jie , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[33]  Steven C. H. Hoi,et al.  Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning , 2012, ICML.

[34]  Rong Jin,et al.  Online Multiple Kernel Learning: Algorithms and Mistake Bounds , 2010, ALT.

[35]  Claudio Gentile,et al.  Tracking the best hyperplane with a simple budget Perceptron , 2006, Machine Learning.

[36]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[37]  Barbara Caputo,et al.  Online-batch strongly convex Multi Kernel Learning , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[39]  Xuan Li,et al.  Association of tissue lineage and gene expression: conservatively and differentially expressed genes define common and special functions of tissues , 2010, BMC Bioinformatics.

[40]  Ofer Dekel From Online to Batch Learning with Cutoff-Averaging , 2008, NIPS.

[41]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[42]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[43]  Yoram Singer,et al.  Data-Driven Online to Batch Conversions , 2005, NIPS.

[44]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[45]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[46]  Bin Li,et al.  Online multiple kernel regression , 2014, KDD.

[47]  Chun Chen,et al.  Efficient Online Learning for Large-scale Sparse Kernel Logistic Regression , 2022 .

[48]  Steven C. H. Hoi,et al.  LIBOL: a library for online learning algorithms , 2014, J. Mach. Learn. Res..

[49]  Jinfeng Yi,et al.  Online Kernel Learning with a Near Optimal Sparsity Bound , 2013, ICML.

[50]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[51]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[52]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[53]  Rong Jin,et al.  Online Multiple Kernel Similarity Learning for Visual Search , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  William Stafford Noble,et al.  Support vector machine , 2013 .

[55]  Ning Chen,et al.  SimApp: A Framework for Detecting Similar Mobile Applications by Online Kernel Learning , 2015, WSDM.

[56]  Johan A. K. Suykens,et al.  L2-norm multiple kernel learning and its application to biomedical data fusion , 2010, BMC Bioinformatics.