Improving the modified nyström method using spectral shifting

The Nystrom method is an efficient approach to enabling large-scale kernel methods. The Nystrom method generates a fast approximation to any large-scale symmetric positive semidefinete (SPSD) matrix using only a few columns of the SPSD matrix. However, since the Nystrom approximation is low-rank, when the spectrum of the SPSD matrix decays slowly, the Nystrom approximation is of low accuracy. In this paper, we propose a variant of the Nystrom method called the modified Nystrom by spectral shifting (SS-Nystrom). The SS-Nystrom method works well no matter whether the spectrum of SPSD matrix decays fast or slow. We prove that our SS-Nystrom has a much stronger error bound than the standard and modified Nystrom methods, and that SS-Nystrom can be even more accurate than the truncated SVD of the same scale in some cases. We also devise an algorithm such that the SS-Nystrom approximation can be computed nearly as efficient as the modified Nystrom approximation. Finally, our SS-Nystrom method demonstrates significant improvements over the standard and modified Nystrom methods on several real-world datasets.

[1]  Rong Jin,et al.  Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[2]  Zhihua Zhang,et al.  The matrix ridge approximation: algorithms and applications , 2013, Machine Learning.

[3]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[4]  James T. Kwok,et al.  Time and space efficient spectral clustering via column sampling , 2011, CVPR 2011.

[5]  Christos Boutsidis,et al.  Near Optimal Column-Based Matrix Reconstruction , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[6]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[7]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[8]  Rong Jin,et al.  Improved Bounds for the Nyström Method With Application to Kernel Classification , 2011, IEEE Transactions on Information Theory.

[9]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[10]  Zhihua Zhang,et al.  Using The Matrix Ridge Approximation to Speedup Determinantal Point Processes Sampling Algorithms , 2014, AAAI.

[11]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[12]  Michael W. Mahoney Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[13]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[14]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[15]  E. Nyström Über Die Praktische Auflösung von Integralgleichungen mit Anwendungen auf Randwertaufgaben , 1930 .

[16]  Zhihua Zhang,et al.  Efficient Algorithms and Error Analysis for the Modified Nystrom Method , 2014, AISTATS.

[17]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[18]  Inderjit S. Dhillon,et al.  Memory Efficient Kernel Approximation , 2014, ICML.

[19]  Zhihua Zhang,et al.  Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling , 2013, J. Mach. Learn. Res..

[20]  James T. Kwok,et al.  Clustered Nyström Method for Large Scale Manifold Learning and Dimension Reduction , 2010, IEEE Transactions on Neural Networks.

[21]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Venkatesan Guruswami,et al.  Optimal column-based low-rank matrix reconstruction , 2011, SODA.

[23]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[24]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[25]  Ameet Talwalkar,et al.  On the Impact of Kernel Approximation on Learning Accuracy , 2010, AISTATS.

[26]  Ivor W. Tsang,et al.  Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[27]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[28]  Ameet Talwalkar,et al.  Large-scale SVD and manifold learning , 2013, J. Mach. Learn. Res..

[29]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[30]  Y. Peres,et al.  Determinantal Processes and Independence , 2005, math/0503110.

[31]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[32]  Ben Taskar,et al.  Nystrom Approximation for Large-Scale Determinantal Processes , 2013, AISTATS.

[33]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..