论文信息 - Large-scale Online Kernel Learning with Random Feature Reparameterization - 字舞流文

Large-scale Online Kernel Learning with Random Feature Reparameterization

A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency.

Trung Le | Tu Dinh Nguyen | Dinh Q. Phung | Hung Bui | H. Bui | Trung Le | T. Nguyen

[1] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[2] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[3] Slobodan Vucetic,et al. Twin Vector Machines for Online Learning on a Budget , 2009, SDM.

[4] Neil D. Lawrence,et al. Gaussian Processes for Big Data , 2013, UAI.

[5] Alexander J. Smola,et al. Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[6] Yoram Singer,et al. The Forgetron: A Kernel-Based Perceptron on a Fixed Budget , 2005, NIPS.

[7] Trung Le,et al. Nonparametric Budgeted Stochastic Gradient Descent , 2016, AISTATS.

[8] Cristian Sminchisescu,et al. Fourier Kernel Learning , 2012, ECCV.

[9] Sayan Mukherjee,et al. Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[10] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[11] Andrew Gordon Wilson,et al. Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[12] Steven C. H. Hoi,et al. Large Scale Online Kernel Learning , 2016, J. Mach. Learn. Res..

[13] Steven C. H. Hoi,et al. Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning , 2012, ICML.

[14] Barnabás Póczos,et al. Bayesian Nonparametric Kernel-Learning , 2015, AISTATS.

[15] Claudio Gentile,et al. Tracking the best hyperplane with a simple budget Perceptron , 2006, Machine Learning.

[16] Alexander J. Smola,et al. Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[17] Koby Crammer,et al. Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[18] Trung Le,et al. Multiple Kernel Learning with Data Augmentation , 2016, ACML.

[19] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[20] Slobodan Vucetic,et al. Online Passive-Aggressive Algorithms on a Budget , 2010, AISTATS.

[21] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[22] Barbara Caputo,et al. Bounded Kernel-Based Online Learning , 2009, J. Mach. Learn. Res..

[23] T. Murray,et al. Volume 9 , 1998 .

[24] Le Song,et al. A la Carte - Learning Fast Kernels , 2014, AISTATS.

[25] Trung Le,et al. Dual Space Gradient Descent for Online Learning , 2016, NIPS.

[26] Alexander J. Smola,et al. Neural Information Processing Systems , 1997, NIPS 1997.

[27] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[28] Luc Van Gool,et al. European conference on computer vision (ECCV) , 2006, eccv 2006.