Sampling Methods for the Nyström Method

The Nystrom method is an efficient technique to generate low-rank matrix approximations and is used in several large-scale learning applications. A key aspect of this method is the procedure according to which columns are sampled from the original matrix. In this work, we explore the efficacy of a variety of fixed and adaptive sampling schemes. We also propose a family of ensemble-based sampling algorithms for the Nystrom method. We report results of extensive experiments that provide a detailed comparison of various fixed and adaptive sampling techniques, and demonstrate the performance improvement associated with the ensemble Nystrom method when used in conjunction with either fixed or adaptive sampling schemes. Corroborating these empirical findings, we present a theoretical analysis of the Nystrom method, providing novel error bounds guaranteeing a better convergence rate of the ensemble Nystrom method in comparison to the standard Nystrom method.

[1]  A. F. Ruston Auerbach's theorem and tensor products of Banach spaces , 1962, Mathematical Proceedings of the Cambridge Philosophical Society.

[2]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[3]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[4]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[5]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[6]  S. Goreinov,et al.  A Theory of Pseudoskeleton Approximations , 1997 .

[7]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[8]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[9]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[10]  G. W. Stewart,et al.  Four algorithms for the the efficient computation of truncated pivoted QR approximations to a sparse matrix , 1999, Numerische Mathematik.

[11]  G. Micula,et al.  Numerical Treatment of the Integral Equations , 1999 .

[12]  B. Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, ICML.

[13]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[14]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[15]  Petros Drineas,et al.  An Experimental Evaluation of a Monte-Carlo Algorithm for Singular Value Decomposition , 2001, Panhellenic Conference on Informatics.

[16]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[17]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression (PIE) database , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[18]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[19]  John C. Platt Fast Embedding of Sparse Similarity Graphs , 2003, NIPS.

[20]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression Database , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[22]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[23]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[25]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[26]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[27]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[28]  Yoshua Bengio,et al.  Greedy Spectral Embedding , 2005, AISTATS.

[29]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[30]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[31]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[32]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[33]  Sanjeev Arora,et al.  A Fast Random Sampling Algorithm for Sparsifying Matrices , 2006, APPROX-RANDOM.

[34]  Stephen C. J. Parker,et al.  Towards the identification of essential genes using targeted genome sequencing and comparative analysis , 2006, BMC Genomics.

[35]  Hao Zhang,et al.  Sub-sampling for Efficient Spectral Mesh Processing , 2006, Computer Graphics International.

[36]  Genevieve Gorrell,et al.  Generalized Hebbian Algorithm for Incremental Singular Value Decomposition in Natural Language Processing , 2006, EACL.

[37]  Mark Rudelson,et al.  Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[38]  Ameet Talwalkar,et al.  Large-scale manifold learning , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Mehryar Mohri,et al.  Stability of transductive regression algorithms , 2008, ICML '08.

[40]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[41]  Ivor W. Tsang,et al.  Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[42]  Mark Tygert,et al.  A Randomized Algorithm for Principal Component Analysis , 2008, SIAM J. Matrix Anal. Appl..

[43]  Ameet Talwalkar,et al.  Sampling Techniques for the Nystrom Method , 2009, AISTATS.

[44]  S. Zucker,et al.  Accelerated dense random projections , 2009 .

[45]  Patrick J. Wolfe,et al.  On landmark selection and sampling in high-dimensional data analysis , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[46]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[47]  Ameet Talwalkar,et al.  Ensemble Nystrom Method , 2009, NIPS.

[48]  Mohamed-Ali Belabbas,et al.  Spectral methods in machine learning and new strategies for very large datasets , 2009, Proceedings of the National Academy of Sciences.

[49]  Ameet Talwalkar,et al.  On sampling-based approximate spectral decomposition , 2009, ICML '09.

[50]  Christos Boutsidis,et al.  An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[51]  Kai Zhang,et al.  Density-Weighted Nyström Method for Computing Large Kernel Eigensystems , 2009, Neural Comput..

[52]  Ameet Talwalkar,et al.  Matrix Coherence and the Nystrom Method , 2010, UAI.

[53]  Ameet Talwalkar,et al.  On the Impact of Kernel Approximation on Learning Accuracy , 2010, AISTATS.

[54]  James T. Kwok,et al.  Making Large-Scale Nyström Approximation Possible , 2010, ICML.

[55]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[56]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[57]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[58]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[59]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.