Gain with no Pain: Efficiency of Kernel-PCA by Nyström Sampling

In this paper, we analyze a Nyström based approach to efficient large scale kernel principal component analysis (PCA). The latter is a natural nonlinear extension of classical PCA based on considering a nonlinear feature map or the corresponding kernel. Like other kernel approaches, kernel PCA enjoys good mathematical and statistical properties but, numerically, it scales poorly with the sample size. Our analysis shows that Nyström sampling greatly improves computational efficiency without incurring any loss of statistical accuracy. While similar effects have been observed in supervised learning, this is the first such result for PCA. Our theoretical findings are based on a combination of analytic and concentration of measure techniques. Our study is more broadly motivated by the question of understanding the interplay between statistical and computational requirements for learning.

[1]  Ivor W. Tsang,et al.  Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[2]  Nello Cristianini,et al.  On the eigenspectrum of the gram matrix and the generalization error of kernel-PCA , 2005, IEEE Transactions on Information Theory.

[3]  Rong Jin,et al.  Improved Bounds for the Nyström Method With Application to Kernel Classification , 2011, IEEE Transactions on Information Theory.

[4]  Bharath K. Sriperumbudur,et al.  Approximate Kernel PCA Using Random Features: Computational vs. Statistical Trade-off , 2017, 1706.06296.

[5]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[6]  Daniele Calandriello,et al.  Distributed Adaptive Sampling for Kernel Matrix Approximation , 2017, AISTATS.

[7]  Elena Metodieva,et al.  Improved Bounds on , 2013 .

[8]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[9]  L. Elsner,et al.  The Hoffman-Wielandt inequality in infinite dimensions , 1994 .

[10]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[11]  R. Arora,et al.  Streaming Kernel PCA with Õ(√n) Random Features , 2018, NIPS 2018.

[12]  Daniele Calandriello,et al.  On Fast Leverage Score Sampling and Optimal Learning , 2018, NeurIPS.

[13]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[14]  Richard Peng,et al.  Uniform Sampling for Matrix Approximation , 2014, ITCS.

[15]  Lorenzo Rosasco,et al.  Less is More: Nyström Computational Regularization , 2015, NIPS.

[16]  Cameron Musco,et al.  Recursive Sampling for the Nystrom Method , 2016, NIPS.

[17]  Barbara Caputo,et al.  The projectron: a bounded kernel-based Perceptron , 2008, ICML '08.

[18]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[19]  M. Reed Methods of Modern Mathematical Physics. I: Functional Analysis , 1972 .

[20]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[21]  Michael W. Mahoney,et al.  Fast Randomized Kernel Ridge Regression with Statistical Guarantees , 2015, NIPS.

[22]  David P. Woodruff,et al.  Fast approximation of matrix coherence and statistical leverage , 2011, ICML.

[23]  Francis R. Bach,et al.  Sharp analysis of low-rank kernel matrix approximations , 2012, COLT.

[24]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[25]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[26]  Lorenzo Rosasco,et al.  On the Sample Complexity of Subspace Learning , 2013, NIPS.

[27]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[28]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[29]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[30]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[31]  Israel Gohberg,et al.  Basic Classes of Linear Operators , 2004 .

[32]  Gilles Blanchard,et al.  Statistical properties of Kernel Prinicipal Component Analysis , 2019 .

[33]  Lorenzo Rosasco,et al.  Generalization Properties of Learning with Random Features , 2016, NIPS.