Dependent Online Kernel Learning With Constant Number of Random Fourier Features

Traditional online kernel learning analysis assumes independently identically distributed (i.i.d.) about the training sequence. Recent studies reveal that when the loss function is smooth and strongly convex, given T i.i.d. training instances, a constant sampling complexity of random Fourier features is sufficient to ensure O(log T/T) convergence rate of excess risk, which is optimal in online kernel learning up to a log T factor. However, the i.i.d. hypothesis is too strong in practice, which greatly impairs their value. In this paper, we study the sampling complexity of random Fourier features in online kernel learning under non-i.i.d. assumptions. We prove that the sampling complexity under non-i.i.d. settings is also constant, but the convergence rate of excess risk is O(log T/T+ φ ), where φ is the mixing coefficient measuring the extent of non-i.i.d. of training sequence. We conduct experiments both on artificial and real large-scale data sets to verify our theories.

[1]  Jinfeng Yi,et al.  Online Kernel Learning with a Near Optimal Sparsity Bound , 2013, ICML.

[2]  R. C. Bradley Basic properties of strong mixing conditions. A survey and some open questions , 2005, math/0511078.

[3]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[4]  Indre Zliobaite,et al.  Learning under Concept Drift: an Overview , 2010, ArXiv.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Changshui Zhang,et al.  On the Sample Complexity of Random Fourier Features for Online Learning , 2014, ACM Trans. Knowl. Discov. Data.

[7]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[8]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[9]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[10]  John C. Duchi,et al.  The Generalization Ability of Online Algorithms for Dependent Data , 2011, IEEE Transactions on Information Theory.

[11]  Si Wu,et al.  Improving support vector machine classifiers by modifying kernel functions , 1999, Neural Networks.

[12]  Rong Jin,et al.  Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[13]  M. Mohri,et al.  Stability Bounds for Stationary φ-mixing and β-mixing Processes , 2010 .

[14]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[15]  Thomas P. Hayes,et al.  High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.

[16]  Steven C. H. Hoi,et al.  Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning , 2012, ICML.

[17]  Barbara Caputo,et al.  The projectron: a bounded kernel-based Perceptron , 2008, ICML '08.

[18]  Žliobait . e,et al.  Learning under Concept Drift: an Overview , 2010 .

[19]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[20]  Li Li,et al.  Support Vector Machines , 2015 .

[21]  Claudio Gentile,et al.  Tracking the best hyperplane with a simple budget Perceptron , 2006, Machine Learning.

[22]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[23]  Stelios D. Bekiros,et al.  Direction-of-change forecasting using a volatility-based recurrent neural network , 2008 .

[24]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[25]  Steven C. H. Hoi,et al.  Large Scale Online Kernel Classification , 2013, IJCAI.

[26]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[27]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[29]  Remo Guidieri Res , 1995, RES: Anthropology and Aesthetics.