Streaming Kernel PCA with \tilde{O}(\sqrt{n}) Random Features

We study the statistical and computational aspects of kernel principal component analysis using random Fourier features and show that under mild assumptions, $O(\sqrt{n} \log n)$ features suffices to achieve $O(1/\epsilon^2)$ sample complexity. Furthermore, we give a memory efficient streaming algorithm based on classical Oja's algorithm that achieves this rate

[1]  Benjamin Recht,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[2]  Bharath K. Sriperumbudur,et al.  Approximate Kernel PCA Using Random Features: Computational vs. Statistical Trade-off , 2017, 1706.06296.

[3]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[4]  Nathan Srebro,et al.  Stochastic optimization for PCA and PLS , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[5]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[6]  Gilles Blanchard,et al.  On the Convergence of Eigenspaces in Kernel Principal Component Analysis , 2005, NIPS.

[7]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[8]  Yuanzhi Li,et al.  First Efficient Convergence for Streaming k-PCA: A Global, Gap-Free, and Near-Optimal Rate , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[9]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[10]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[11]  Le Song,et al.  Scale Up Nonlinear Component Analysis with Doubly Stochastic Gradients , 2015, NIPS.

[12]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[13]  Bernhard Schölkopf,et al.  Randomized Nonlinear Component Analysis , 2014, ICML.

[14]  W. Rudin,et al.  Fourier Analysis on Groups. , 1965 .

[15]  P. Massart Some applications of concentration inequalities to statistics , 2000 .

[16]  Martin J. Wainwright,et al.  Randomized sketches for kernels: Fast and optimal non-parametric regression , 2015, ArXiv.

[17]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[18]  Nello Cristianini,et al.  On the eigenspectrum of the gram matrix and the generalization error of kernel-PCA , 2005, IEEE Transactions on Information Theory.

[19]  Peter L. Bartlett,et al.  Localized Rademacher Complexities , 2002, COLT.

[20]  Yuanzhi Li,et al.  Even Faster SVD Decomposition Yet Without Agonizing Pain , 2016, NIPS.

[21]  Jeff M. Phillips,et al.  Streaming Kernel Principal Component Analysis , 2015, AISTATS.

[22]  Gilles Blanchard,et al.  Statistical properties of kernel principal component analysis , 2007, Machine Learning.

[23]  Nathan Srebro,et al.  Stochastic Optimization of PCA with Capped MSG , 2013, NIPS.

[24]  M. Reed Methods of Modern Mathematical Physics. I: Functional Analysis , 1972 .

[25]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[26]  Raman Arora,et al.  Stochastic PCA with $\ell_2$ and $\ell_1$ Regularization , 2018, International Conference on Machine Learning.

[27]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[28]  Arthur Gretton,et al.  What is an RKHS? , 2012 .

[29]  Edo Liberty,et al.  Simple and deterministic matrix sketching , 2012, KDD.

[30]  Lorenzo Rosasco,et al.  Generalization Properties of Learning with Random Features , 2016, NIPS.