A Linear Incremental Nyström Method for Online Kernel Learning

Although the incremental Nyström method has been used in kernel approximation, it is not suitable for online kernel learning due to the cubic time complexity and the lack of theoretical guarantees. In this paper, we propose a novel incremental Nyström method, which is in a linear time complexity with respect to the sampling size at each round, and enjoys a sublinear regret bound for online kernel learning. We construct the intersection matrix using the ridge leverage score estimator, compute the rank-k approximation of the intersection matrix incrementally via the incremental singular value decomposition, and recalculate the generalized inverse matrix periodically. When applying the proposed incremental Nyström method to online kernel learning, we approximate the kernel matrix using the updated generalized inverse matrix at each round, and formulate the explicit feature mapping by the singular value decomposition of the approximated kernel matrix, yielding the linear classifier for online kernel learning at each round. Theoretically, we prove that our incremental Nyström method has a $(1+\epsilon)$ relative-error bound for kernel matrix approximation, enjoys a sublinear regret bound using online gradient descent method for online kernel learning, and reduces the time complexity of generalized inverse computation from $O(m^{3})$ to $O(mk)$ at each round, where $m$ is the sampling size and $k$ is the truncated rank. Experimental results show that the proposed incremental Nyström method is accurate and efficient in kernel matrix approximation and is suitable for online kernel learning.

[1]  Rong Jin,et al.  Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[2]  Steven C. H. Hoi,et al.  Large Scale Online Kernel Learning , 2016, J. Mach. Learn. Res..

[3]  Matthew Brand,et al.  Incremental Singular Value Decomposition of Uncertain Data with Missing Values , 2002, ECCV.

[4]  Michael W. Mahoney,et al.  Fast Randomized Kernel Ridge Regression with Statistical Guarantees , 2015, NIPS.

[5]  James T. Kwok,et al.  Making Large-Scale Nyström Approximation Possible , 2010, ICML.

[6]  Shizhong Liao,et al.  Predictive Nyström method for kernel methods , 2017, Neurocomputing.

[7]  Daniele Calandriello,et al.  Pack only the essentials: Adaptive dictionary learning for kernel ridge regression , 2016, NIPS 2016.

[8]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[9]  Zhihua Zhang,et al.  Towards More Efficient SPSD Matrix Approximation and CUR Matrix Decomposition , 2015, J. Mach. Learn. Res..

[10]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[11]  Jinhui Xu,et al.  One-pass online SVM with extremely small space complexity , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[12]  Zhihua Zhang,et al.  Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling , 2013, J. Mach. Learn. Res..

[13]  M. Brand,et al.  Fast low-rank modifications of the thin singular value decomposition , 2006 .

[14]  Michael W. Mahoney Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[15]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[16]  Michael B. Cohen,et al.  Ridge Leverage Scores for Low-Rank Approximation , 2015, ArXiv.

[17]  Daniele Calandriello,et al.  Analysis of Nyström method with sequential ridge leverage score sampling , 2016, UAI 2016.