Double Nyström Method: An Efficient and Accurate Nyström Scheme for Large-Scale Data Sets

The Nystrom method has been one of the most effective techniques for kernel-based approach that scales well to large data sets. Since its introduction, there has been a large body of work that improves the approximation accuracy while maintaining computational efficiency. In this paper, we present a novel Nystrom method that improves both accuracy and efficiency based on a new theoretical analysis. We first provide a generalized sampling scheme, CAPS, that minimizes a novel error bound based on the subspace distance. We then present our double Nystrom method that reduces the size of the decomposition in two stages. We show that our method is highly efficient and accurate compared to other state-of-the-art Nystrom methods by evaluating them on a number of real data sets.

[1]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[3]  Ivor W. Tsang,et al.  Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[4]  Liwei Wang,et al.  Subspace distance analysis with application to adaptive Bayesian algorithm for face recognition , 2006, Pattern Recognit..

[5]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[6]  Ameet Talwalkar,et al.  On sampling-based approximate spectral decomposition , 2009, ICML '09.

[7]  Ameet Talwalkar,et al.  Large-scale manifold learning , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Inderjit S. Dhillon,et al.  Fast Prediction for Large-Scale Kernel Machines , 2014, NIPS.

[9]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[10]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[11]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[12]  James T. Kwok,et al.  Large-Scale Nyström Kernel Matrix Approximation Using Randomized SVD , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Zhihua Zhang,et al.  Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling , 2013, J. Mach. Learn. Res..

[14]  James T. Kwok,et al.  Clustered Nyström Method for Large Scale Manifold Learning and Dimension Reduction , 2010, IEEE Transactions on Neural Networks.

[15]  David P. Woodruff,et al.  Fast approximation of matrix coherence and statistical leverage , 2011, ICML.

[16]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[17]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[18]  Liwei Wang,et al.  Further results on the subspace distance , 2007, Pattern Recognit..

[19]  S. V. N. Vishwanathan,et al.  Fast Iterative Kernel Principal Component Analysis , 2007, J. Mach. Learn. Res..

[20]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[21]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..