Randomized methods for computing low-rank approximations of matrices

Randomized sampling techniques have recently proved capable of efficiently solving many standard problems in linear algebra, and enabling computations at scales far larger than what was previously possible. The new algorithms are designed from the bottom up to perform well in modern computing environments where the expense of communication is the primary constraint. In extreme cases, the algorithms can even be made to work in a streaming environment where the matrix is not stored at all, and each element can be seen only once. The dissertation describes a set of randomized techniques for rapidly constructing a low-rank approximation to a matrix. The algorithms are presented in a modular framework that first computes an approximation to the range of the matrix via randomized sampling. Secondly, the matrix is projected to the approximate range, and a factorization (SVD, QR, LU, etc.) of the resulting low-rank matrix is computed via variations of classical deterministic methods. Theoretical performance bounds are provided. Particular attention is given to very large scale computations where the matrix does not fit in RAM on a single workstation. Algorithms are developed for the case where the original matrix must be stored out-of-core but where the factors of the approximation fit in RAM. Numerical examples are provided that perform Principal Component Analysis of a data set that is so large that less than one hundredth of it can fit in the RAM of a standard laptop computer. Furthermore, the dissertation presents a parallelized randomized scheme for computing a reduced rank Singular Value Decomposition. By parallelizing and distributing both the randomized sampling stage and the processing of the factors in the approximate factorization, the method requires an amount of memory per node which is independent of both dimensions of the input matrix. Numerical experiments are performed on Hadoop clusters of computers in Amazon's Elastic Compute Cloud with up to 64 total cores. Finally, we directly compare the performance and accuracy of the randomized algorithm with the classical Lanczos method on extremely large, sparse matrices and substantiate the claim that randomized methods are superior in this environment.

[1]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[2]  W. Press,et al.  Numerical Recipes: The Art of Scientific Computing , 1987 .

[3]  J. Kuczy,et al.  Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start , 1992 .

[4]  Nikhil Srivastava,et al.  Graph Sparsification by Effective Resistances , 2011, SIAM J. Comput..

[5]  A. Edelman Eigenvalues and condition numbers of random matrices , 1988 .

[6]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[7]  D. Needell Randomized Kaczmarz solver for noisy linear systems , 2009, 0902.0958.

[8]  Santosh S. Vempala,et al.  Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.

[9]  Jimeng Sun,et al.  Less is More: Compact Matrix Decomposition for Large Sparse Graphs , 2007, SDM.

[10]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[11]  Nir Ailon,et al.  Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes , 2008, SODA '08.

[12]  B. S. Kašin,et al.  DIAMETERS OF SOME FINITE-DIMENSIONAL SETS AND CLASSES OF SMOOTH FUNCTIONS , 1977 .

[13]  R. Vershynin,et al.  A Randomized Kaczmarz Algorithm with Exponential Convergence , 2007, math/0702226.

[14]  Amit Singer,et al.  Dense Fast Random Projections and Lean Walsh Transforms , 2008, APPROX-RANDOM.

[15]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[16]  H. Wozniakowski,et al.  Estimating a largest eigenvector by Lanczos and polynomial algorithms with a random start , 1998 .

[17]  Philipp Birken,et al.  Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.

[18]  L. Greengard,et al.  A new version of the Fast Multipole Method for the Laplace equation in three dimensions , 1997, Acta Numerica.

[19]  Trac D. Tran,et al.  A fast and efficient algorithm for low-rank approximation of a matrix , 2009, STOC '09.

[20]  B. Carl Inequalities of Bernstein-Jackson-type and the degree of compactness of operators in Banach spaces , 1985 .

[21]  Christos Boutsidis,et al.  An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[22]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[23]  G. Stewart Accelerating the orthogonal iteration for the eigenvectors of a Hermitian matrix , 1969 .

[24]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[25]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[26]  V. Rokhlin,et al.  A fast randomized algorithm for the approximation of matrices ✩ , 2007 .

[27]  Alan M. Frieze,et al.  Fast Monte-Carlo algorithms for finding low-rank approximations , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[28]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[29]  S. Szarek Spaces with large distance to l∞n and random matrices , 1990 .

[30]  Mark Rudelson,et al.  Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[31]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[32]  Zizhong Chen,et al.  Condition Numbers of Gaussian Random Matrices , 2005, SIAM J. Matrix Anal. Appl..

[33]  Per-Gunnar Martinsson,et al.  Randomized algorithms for the low-rank approximation of matrices , 2007, Proceedings of the National Academy of Sciences.

[34]  Francis Sullivan,et al.  The Metropolis Algorithm , 2000, Computing in Science & Engineering.

[35]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[36]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[37]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[38]  Christos Boutsidis,et al.  Unsupervised feature selection for principal components analysis , 2008, KDD.

[39]  Sanjeev Arora,et al.  A Fast Random Sampling Algorithm for Sparsifying Matrices , 2006, APPROX-RANDOM.

[40]  Luis Rademacher,et al.  Efficient Volume Sampling for Row/Column Subset Selection , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[41]  James Demmel,et al.  Fast linear algebra is stable , 2006, Numerische Mathematik.

[42]  V. Bogachev Gaussian Measures on a , 2022 .

[43]  V. S. Vykhovanets,et al.  Spectral Methods in Logical Data Analysis , 2001 .

[44]  Nikolay Fedorovich Kirichenko,et al.  Perturbation of Pseudo-inverse and Projective Matrices and Their Application for Identification of Linear and Nonlinear Relations , 2001 .

[45]  J. Tropp On the conditioning of random subdictionaries , 2008 .

[46]  Alan M. Frieze,et al.  Clustering in large graphs and matrices , 1999, SODA '99.

[47]  D. S. Parker,et al.  The randomizing FFT : an alternative to pivoting in GaussianeliminationD , 1995 .

[48]  Malik Magdon-Ismail,et al.  On selecting a maximum volume sub-matrix of a matrix and related problems , 2009, Theor. Comput. Sci..

[49]  L Sirovich,et al.  Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[50]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[51]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.

[52]  L. Mirsky SYMMETRIC GAUGE FUNCTIONS AND UNITARILY INVARIANT NORMS , 1960 .

[53]  Christos Boutsidis,et al.  Random Projections for the Nonnegative Least-Squares Problem , 2008, ArXiv.

[54]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[55]  J. Dixon Estimating Extremal Eigenvalues and Condition Numbers of Matrices , 1983 .

[56]  Anirban Dasgupta,et al.  Sampling algorithms and coresets for ℓp regression , 2007, SODA '08.

[57]  Ronald R. Coifman,et al.  Regularization on Graphs with Function-adapted Diffusion Processes , 2008, J. Mach. Learn. Res..

[58]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[59]  David P. Woodruff,et al.  Numerical linear algebra in the streaming model , 2009, STOC '09.

[60]  C. Pan On the existence and computation of rank-revealing LU factorizations , 2000 .

[61]  Noga Alon,et al.  Tracking join and self-join sizes in limited storage , 1999, PODS '99.

[62]  S. Zucker,et al.  Accelerated dense random projections , 2009 .

[63]  V. Rokhlin,et al.  A fast randomized algorithm for overdetermined linear least-squares regression , 2008, Proceedings of the National Academy of Sciences.

[64]  Wolfgang Hackbusch,et al.  Construction and Arithmetics of H-Matrices , 2003, Computing.

[65]  S. Goreinov,et al.  A Theory of Pseudoskeleton Approximations , 1997 .

[66]  Alexandre d'Aspremont,et al.  Subsampling algorithms for semidefinite programming , 2008, 0803.1990.

[67]  G. Stewart,et al.  Reorthogonalization and stable algorithms for updating the Gram-Schmidt QR factorization , 1976 .

[68]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[69]  E. Candès,et al.  Sparsity and incoherence in compressive sampling , 2006, math/0611957.

[70]  Douglas Stott Parker,et al.  Using randomization to make recursive matrix algorithms practical , 1999, J. Funct. Program..

[71]  E.J. Candes Compressive Sampling , 2022 .

[72]  V. Rokhlin,et al.  A randomized algorithm for the approximation of matrices , 2006 .

[73]  Shmuel Friedland,et al.  Fast Monte-Carlo low rank approximations for matrices , 2006, 2006 IEEE/SMC International Conference on System of Systems Engineering.

[74]  Martin Vetterli,et al.  Data Compression and Harmonic Analysis , 1998, IEEE Trans. Inf. Theory.

[75]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[76]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[77]  S. Muthukrishnan,et al.  Subspace Sampling and Relative-Error Matrix Approximation: Column-Based Methods , 2006, APPROX-RANDOM.

[78]  David F. Gleich,et al.  Tall and skinny QR factorizations in MapReduce architectures , 2011, MapReduce '11.

[79]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[80]  Dimitris Achlioptas,et al.  Fast computation of low-rank matrix approximations , 2007, JACM.

[81]  Å. Björck Numerics of Gram-Schmidt orthogonalization , 1994 .

[82]  Y. Gordon Some inequalities for Gaussian processes and applications , 1985 .

[83]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[84]  Andrej Yu. Garnaev,et al.  On widths of the Euclidean Ball , 1984 .

[85]  Patrick J. Wolfe,et al.  On sparse representations of linear operators and the approximation of matrix products , 2007, 2008 42nd Annual Conference on Information Sciences and Systems.

[86]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[87]  Nathan Halko,et al.  An Algorithm for the Principal Component Analysis of Large Data Sets , 2010, SIAM J. Sci. Comput..

[88]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[89]  M. Stephens,et al.  Interpreting principal component analyses of spatial population genetic variation , 2008, Nature Genetics.

[90]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[91]  Mark Tygert,et al.  A Randomized Algorithm for Principal Component Analysis , 2008, SIAM J. Matrix Anal. Appl..

[92]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[93]  Jack J. Dongarra,et al.  Guest Editors Introduction to the top 10 algorithms , 2000, Comput. Sci. Eng..

[94]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication , 2006, SIAM J. Comput..

[95]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[96]  S. Muthukrishnan,et al.  Faster least squares approximation , 2007, Numerische Mathematik.

[97]  M. Rudelson Random Vectors in the Isotropic Position , 1996, math/9608208.

[98]  S. Shalev-Shwartz Low ` 1-Norm and Guarantees on Sparsifiability , 2008 .

[99]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[100]  Harry Wechsler,et al.  The FERET database and evaluation procedure for face-recognition algorithms , 1998, Image Vis. Comput..

[101]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[102]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[103]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[104]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[105]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[106]  B. Engquist,et al.  Wavelet-Based Numerical Homogenization with Applications , 2002 .

[107]  Kasturi R. Varadarajan,et al.  Efficient Subspace Approximation Algorithms , 2007, Discrete & Computational Geometry.

[108]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[109]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[110]  Hyeonjoon Moon,et al.  The FERET Evaluation Methodology for Face-Recognition Algorithms , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[111]  K. Clarkson Subgradient and sampling algorithms for l1 regression , 2005, SODA '05.

[112]  M. Ledoux The concentration of measure phenomenon , 2001 .

[113]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[114]  Per-Gunnar Martinsson,et al.  On the Compression of Low Rank Matrices , 2005, SIAM J. Sci. Comput..

[115]  Xilin Shen,et al.  Low-dimensional embedding of fMRI datasets , 2007, NeuroImage.

[116]  N. Metropolis,et al.  The Monte Carlo method. , 1949 .

[117]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[118]  Anupam Gupta,et al.  An elementary proof of the Johnson-Lindenstrauss Lemma , 1999 .

[119]  R. Radke A Matlab implementation of the Implicitly Restarted Arnoldi Method for solving large-scale eigenvalue problems , 1996 .

[120]  Sariel Har-Peled,et al.  Low Rank Matrix Approximation in Linear Time , 2014, ArXiv.

[121]  Karel Hrbacek,et al.  A New Proof that π , 1979, Math. Log. Q..