Adaptive Online Kernel Sampling for Vertex Classification

This paper studies online kernel learning (OKL) for graph classification problem, since the large approximation space provided by reproducing kernel Hilbert spaces often contains an accurate function. Nonetheless, optimizing over this space is computationally expensive. To address this issue, approximate OKL is introduced to reduce the complexity either by limiting the support vector (SV) used by the predictor, or by avoiding the kernelization process altogether using embedding. Nonetheless, as long as the size of the approximation space or the number of SV does not grow over time, an adversarial environment can always exploit the approximation process. In this paper, we introduce an online kernel sampling (OKS) technique, a new second-order OKL method that slightly improve the bound from O(d log T ) down to O(r log T ) where r is the rank of the learned data and is usually much smaller than d. To reduce the computational complexity of second-order methods, we introduce a randomized sampling algorithm for sketching kernel matrix Kt and show that our method is effective to reduce the time and space complexity significantly while maintaining comparable performance. Empirical experimental results demonstrate that the proposed model is highly effective on real-world graph datasets.

[1]  Jiawei Han,et al.  Batch-Mode Active Learning via Error Bound Minimization , 2014, UAI.

[2]  Daniele Calandriello,et al.  On Fast Leverage Score Sampling and Optimal Learning , 2018, NeurIPS.

[3]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[4]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[5]  Rong Jin,et al.  Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[6]  Daniele Calandriello,et al.  Second-Order Kernel Online Convex Optimization with Adaptive Sketching , 2017, ICML.

[7]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[8]  Barbara Caputo,et al.  The projectron: a bounded kernel-based Perceptron , 2008, ICML '08.

[9]  Dan Kushnir,et al.  Active-transductive learning with label-adapted kernels , 2014, KDD.

[10]  Steven C. H. Hoi,et al.  Large Scale Online Kernel Learning , 2016, J. Mach. Learn. Res..

[11]  Ping Li Linearized GMM Kernels and Normalized Random Fourier Features , 2017, KDD.

[12]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[13]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[14]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[15]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[16]  Steven C. H. Hoi,et al.  Exact Soft Confidence-Weighted Learning , 2012, ICML.

[17]  Ping Li,et al.  Distributed Primal-Dual Optimization for Online Multi-Task Learning , 2020, AAAI.

[18]  Tong Zhang,et al.  Learning on Graph with Laplacian Regularization , 2006, NIPS.

[19]  Claudio Gentile,et al.  Tracking the best hyperplane with a simple budget Perceptron , 2006, Machine Learning.

[20]  Charu C. Aggarwal,et al.  Selective sampling on graphs for classification , 2013, KDD.

[21]  Klaus-Robert Müller,et al.  Incremental Support Vector Learning: Analysis, Implementation and Applications , 2006, J. Mach. Learn. Res..

[22]  Ping Li,et al.  Efficient Online Multi-Task Learning via Adaptive Kernel Selection , 2020, WWW.

[23]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[24]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Fixed Budget , 2005, NIPS.

[25]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[26]  Ping Li,et al.  Graph Analysis and Graph Pooling in the Spatial Domain , 2019, ArXiv.

[27]  Yuri Kalnishkan,et al.  An Identity for Kernel Ridge Regression , 2010, ALT.

[28]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[29]  Haipeng Luo,et al.  Efficient Second Order Online Learning by Sketching , 2016, NIPS.

[30]  Koby Crammer,et al.  New Adaptive Algorithms for Online Classification , 2010, NIPS.

[31]  Ping Li,et al.  A new space for comparing graphs , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[32]  Ping Li,et al.  Theory of the GMM Kernel , 2016, WWW.

[33]  Claudio Gentile,et al.  Worst-Case Analysis of Selective Sampling for Linear Classification , 2006, J. Mach. Learn. Res..

[34]  Mark Herbster,et al.  Prediction on a Graph with a Perceptron , 2006, NIPS.

[35]  Alexander Gammerman,et al.  On-line Prediction with Kernels and the Complexity Approximation Principle , 2004, UAI.

[36]  Koby Crammer,et al.  Multiclass classification with bandit feedback using adaptive regularization , 2012, Machine Learning.