Less is More: Nystr\"om Computational Regularization

We study Nystr\"om type subsampling approaches to large scale kernel methods, and prove learning bounds in the statistical learning setting, where random sampling and high probability estimates are considered. In particular, we prove that these approaches can achieve optimal learning bounds, provided the subsampling level is suitably chosen. These results suggest a simple incremental variant of Nystr\"om Kernel Regularized Least Squares, where the subsampling level implements a form of computational regularization, in the sense that it controls at the same time regularization and computations. Extensive experimental analysis shows that the considered approach achieves state of the art performances on benchmark large scale datasets.

[1]  Adi Ben-Israel,et al.  Generalized inverses: theory and applications , 1974 .

[2]  F. Hansen,et al.  An operator inequality , 1980 .

[3]  J. Fujii,et al.  Norm inequalities equivalent to Heinz inequality , 1993 .

[4]  V. Yurinsky Sums and Gaussian Vectors , 1995 .

[5]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[6]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[7]  Lorenzo Rosasco,et al.  On regularization algorithms in learning theory , 2007, J. Complex..

[8]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[9]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[10]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[11]  Massimiliano Pontil,et al.  Online Gradient Descent Learning Algorithms , 2008, Found. Comput. Math..

[12]  Lorenzo Rosasco,et al.  Spectral Algorithms for Supervised Learning , 2008, Neural Computation.

[13]  Don R. Hush,et al.  Optimal Rates for Regularized Least Squares Regression , 2009, COLT.

[14]  Ameet Talwalkar,et al.  Ensemble Nystrom Method , 2009, NIPS.

[15]  Y. Yao,et al.  Cross-validation based adaptation for regularization operators in learning theory , 2010 .

[16]  Ameet Talwalkar,et al.  On the Impact of Kernel Approximation on Learning Accuracy , 2010, AISTATS.

[17]  S. Mendelson,et al.  Regularization in kernel learning , 2010, 1001.2094.

[18]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[19]  Rong Jin,et al.  Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[20]  David P. Woodruff,et al.  Fast approximation of matrix coherence and statistical leverage , 2011, ICML.

[21]  J. Tropp User-Friendly Tools for Random Matrices: An Introduction , 2012 .

[22]  Rong Jin,et al.  Improved Bounds for the Nyström Method With Application to Kernel Classification , 2011, IEEE Transactions on Information Theory.

[23]  Martin J. Wainwright,et al.  Divide and Conquer Kernel Ridge Regression , 2013, COLT.

[24]  Alexander J. Smola,et al.  Fastfood - Computing Hilbert Space Expansions in loglinear time , 2013, ICML.

[25]  Francis R. Bach,et al.  Sharp analysis of low-rank kernel matrix approximations , 2012, COLT.

[26]  Zhihua Zhang,et al.  Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling , 2013, J. Mach. Learn. Res..

[27]  Lorenzo Rosasco,et al.  On the Sample Complexity of Subspace Learning , 2013, NIPS.

[28]  Inderjit S. Dhillon,et al.  Memory Efficient Kernel Approximation , 2014, ICML.

[29]  Michael W. Mahoney,et al.  Fast Randomized Kernel Methods With Statistical Guarantees , 2014, ArXiv.

[30]  Le Song,et al.  Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.

[31]  Zhihua Zhang,et al.  Efficient Algorithms and Error Analysis for the Modified Nystrom Method , 2014, AISTATS.

[32]  Richard Peng,et al.  Uniform Sampling for Matrix Approximation , 2014, ITCS.

[33]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..