Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, robustness, and/or speed. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the $k$ dominant components of the singular value decomposition of an $m \times n$ matrix. (i) For a dense input matrix, randomized algorithms require $\bigO(mn \log(k))$ floating-point operations (flops) in contrast to $ \bigO(mnk)$ for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multiprocessor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to $\bigO(k)$ passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.

[1]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[2]  J. Neumann,et al.  Numerical inverting of matrices of high order , 1947 .

[3]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[4]  J. Neumann,et al.  Numerical inverting of matrices of high order. II , 1951 .

[5]  L. Mirsky SYMMETRIC GAUGE FUNCTIONS AND UNITARILY INVARIANT NORMS , 1960 .

[6]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[7]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[8]  S. M. Samuels,et al.  Monotone Convergence of Binomial Probabilities and a Generalization of Ramanujan's Equation , 1968 .

[9]  G. Stewart Accelerating the orthogonal iteration for the eigenvectors of a Hermitian matrix , 1969 .

[10]  G. Stewart On the Perturbation of Pseudo-Inverses, Projections and Linear Least Squares Problems , 1977 .

[11]  Karel Hrbacek,et al.  A New Proof that π , 1979, Math. Log. Q..

[12]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.

[13]  Gene H. Golub,et al.  Matrix computations , 1983 .

[14]  J. Dixon Estimating Extremal Eigenvalues and Condition Numbers of Matrices , 1983 .

[15]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[16]  B. Carl Inequalities of Bernstein-Jackson-type and the degree of compactness of operators in Banach spaces , 1985 .

[17]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[18]  Y. Gordon Some inequalities for Gaussian processes and applications , 1985 .

[19]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[20]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[21]  L Sirovich,et al.  Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[22]  Y. Gordon Gaussian Processes and Almost Spherical Sections of Convex Bodies , 1988 .

[23]  A. Edelman Eigenvalues and condition numbers of random matrices , 1988 .

[24]  J. E. Glynn,et al.  Numerical Recipes: The Art of Scientific Computing , 1989 .

[25]  S. Szarek Spaces with large distance to l∞n and random matrices , 1990 .

[26]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[27]  Henryk Wozniakowski,et al.  Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start , 1992, SIAM J. Matrix Anal. Appl..

[28]  J. Kuczy,et al.  Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start , 1992 .

[29]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[30]  David R. Karger,et al.  Random sampling in cut, flow, and network design problems , 1994, STOC '94.

[31]  D. S. Parker,et al.  The randomizing FFT : an alternative to pivoting in GaussianeliminationD , 1995 .

[32]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[33]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[34]  J. Navarro-Pedreño Numerical Methods for Least Squares Problems , 1996 .

[35]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[36]  Åke Björck,et al.  Numerical methods for least square problems , 1996 .

[37]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[38]  R. Bhatia Matrix Analysis , 1996 .

[39]  M. Rudelson Random Vectors in the Isotropic Position , 1996, math/9608208.

[40]  Hyeonjoon Moon,et al.  The FERET evaluation methodology for face-recognition algorithms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  M. Ledoux On Talagrand's deviation inequalities for product measures , 1997 .

[42]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[43]  L. Greengard,et al.  A new version of the Fast Multipole Method for the Laplace equation in three dimensions , 1997, Acta Numerica.

[44]  L. Trefethen,et al.  Numerical linear algebra , 1997 .

[45]  S. Goreinov,et al.  A Theory of Pseudoskeleton Approximations , 1997 .

[46]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[47]  Henryk Wozniakowski,et al.  Estimating a largest eigenvector by Lanczos and polynomial algorithms with a random start , 1998, Numer. Linear Algebra Appl..

[48]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[49]  Martin Vetterli,et al.  Data Compression and Harmonic Analysis , 1998, IEEE Trans. Inf. Theory.

[50]  H. Wozniakowski,et al.  Estimating a largest eigenvector by Lanczos and polynomial algorithms with a random start , 1998 .

[51]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[52]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[53]  Harry Wechsler,et al.  The FERET database and evaluation procedure for face-recognition algorithms , 1998, Image Vis. Comput..

[54]  Anupam Gupta,et al.  An elementary proof of the Johnson-Lindenstrauss Lemma , 1999 .

[55]  Alan M. Frieze,et al.  Clustering in large graphs and matrices , 1999, SODA '99.

[56]  G. W. Stewart,et al.  Four algorithms for the the efficient computation of truncated pivoted QR approximations to a sparse matrix , 1999, Numerische Mathematik.

[57]  Douglas Stott Parker,et al.  Using randomization to make recursive matrix algorithms practical , 1999, J. Funct. Program..

[58]  David R. Karger,et al.  Random Sampling in Cut, Flow, and Network Design Problems , 1999, Math. Oper. Res..

[59]  Noga Alon,et al.  Tracking join and self-join sizes in limited storage , 1999, PODS '99.

[60]  Jack J. Dongarra,et al.  Guest Editors Introduction to the top 10 algorithms , 2000, Comput. Sci. Eng..

[61]  G. W. Stewart,et al.  The decompositional approach to matrix computation , 2000, Comput. Sci. Eng..

[62]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[63]  David R. Karger,et al.  Minimum cuts in near-linear time , 1998, JACM.

[64]  C. Pan On the existence and computation of rank-revealing LU factorizations , 2000 .

[65]  Francis Sullivan,et al.  The Metropolis Algorithm , 2000, Computing in Science & Engineering.

[66]  Klaus Jansen,et al.  Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques , 2012, Lecture Notes in Computer Science.

[67]  A. Buchholz Operator Khintchine inequality in non-commutative probability , 2001 .

[68]  M. Ledoux The concentration of measure phenomenon , 2001 .

[69]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[70]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[71]  S. Szarek,et al.  Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .

[72]  B. Engquist,et al.  Wavelet-Based Numerical Homogenization with Applications , 2002 .

[73]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[74]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[75]  Wolfgang Hackbusch,et al.  Construction and Arithmetics of H-Matrices , 2003, Computing.

[76]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[77]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[78]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[79]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[80]  Hans C. van Houwelingen,et al.  The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer, New York, 2001. No. of pages: xvi+533. ISBN 0‐387‐95284‐5 , 2004 .

[81]  Anna R. Karlin,et al.  Spectral methods for data analysis , 2004 .

[82]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[83]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[84]  Per-Gunnar Martinsson,et al.  On the Compression of Low Rank Matrices , 2005, SIAM J. Sci. Comput..

[85]  Zizhong Chen,et al.  Condition Numbers of Gaussian Random Matrices , 2005, SIAM J. Matrix Anal. Appl..

[86]  A. Buchholz Optimal Constants in Khintchine Type Inequalities for Fermions, Rademachers and q-Gaussian Operators , 2005 .

[87]  K. Clarkson Subgradient and sampling algorithms for l1 regression , 2005, SODA '05.

[88]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[89]  V. Rokhlin,et al.  A randomized algorithm for the approximation of matrices , 2006 .

[90]  P. Atzberger The Monte-Carlo Method , 2006 .

[91]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[92]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[93]  E.J. Candes Compressive Sampling , 2022 .

[94]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[95]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition , 2006, SIAM J. Comput..

[96]  S. Muthukrishnan,et al.  Subspace Sampling and Relative-Error Matrix Approximation: Column-Based Methods , 2006, APPROX-RANDOM.

[97]  Emmanuel J. Candès,et al.  Quantitative Robust Uncertainty Principles and Optimally Sparse Decompositions , 2004, Found. Comput. Math..

[98]  Sanjeev Arora,et al.  A Fast Random Sampling Algorithm for Sparsifying Matrices , 2006, APPROX-RANDOM.

[99]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[100]  M. Rudelson,et al.  Sparse reconstruction by convex relaxation: Fourier and Gaussian measurements , 2006, 2006 40th Annual Conference on Information Sciences and Systems.

[101]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication , 2006, SIAM J. Comput..

[102]  Santosh S. Vempala,et al.  Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.

[103]  Kasturi R. Varadarajan,et al.  Efficient Subspace Approximation Algorithms , 2007, Discrete & Computational Geometry.

[104]  Mark Rudelson,et al.  Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[105]  Jimeng Sun,et al.  Less is More: Compact Matrix Decomposition for Large Sparse Graphs , 2007, SDM.

[106]  Per-Gunnar Martinsson,et al.  Randomized algorithms for the low-rank approximation of matrices , 2007, Proceedings of the National Academy of Sciences.

[107]  Michael W. Mahoney,et al.  A randomized algorithm for a tensor-based generalization of the singular value decomposition , 2007 .

[108]  E. Candès,et al.  Sparsity and incoherence in compressive sampling , 2006, math/0611957.

[109]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[110]  R. Vershynin,et al.  A Randomized Kaczmarz Algorithm with Exponential Convergence , 2007, math/0702226.

[111]  M. Rozložník Numerics of Gram-Schmidt orthogonalization , 2007 .

[112]  V. Rokhlin,et al.  A fast randomized algorithm for the approximation of matrices ✩ , 2007 .

[113]  James Demmel,et al.  Fast linear algebra is stable , 2006, Numerische Mathematik.

[114]  V. Bogachev Gaussian Measures on a , 2022 .

[115]  Ronald R. Coifman,et al.  Regularization on Graphs with Function-adapted Diffusion Processes , 2008, J. Mach. Learn. Res..

[116]  Anirban Dasgupta,et al.  Sampling algorithms and coresets for ℓp regression , 2007, SODA '08.

[117]  Alexandre d'Aspremont,et al.  Subsampling algorithms for semidefinite programming , 2008, 0803.1990.

[118]  Christos Boutsidis,et al.  Unsupervised feature selection for principal components analysis , 2008, KDD.

[119]  Xilin Shen,et al.  Low-dimensional embedding of fMRI datasets , 2007, NeuroImage.

[120]  S. Shalev-Shwartz Low ` 1-Norm and Guarantees on Sparsifiability , 2008 .

[121]  Nir Ailon,et al.  Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes , 2008, SODA '08.

[122]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[123]  Nikhil Srivastava,et al.  Graph sparsification by effective resistances , 2008, SIAM J. Comput..

[124]  J. Tropp On the conditioning of random subdictionaries , 2008 .

[125]  V. Rokhlin,et al.  A fast randomized algorithm for overdetermined linear least-squares regression , 2008, Proceedings of the National Academy of Sciences.

[126]  Christos Boutsidis,et al.  Random Projections for the Nonnegative Least-Squares Problem , 2008, ArXiv.

[127]  Patrick J. Wolfe,et al.  On sparse representations of linear operators and the approximation of matrix products , 2007, 2008 42nd Annual Conference on Information Sciences and Systems.

[128]  Amit Singer,et al.  Dense Fast Random Projections and Lean Walsh Transforms , 2008, APPROX-RANDOM.

[129]  Mark Tygert,et al.  A Randomized Algorithm for Principal Component Analysis , 2008, SIAM J. Matrix Anal. Appl..

[130]  Trac D. Tran,et al.  A fast and efficient algorithm for low-rank approximation of a matrix , 2009, STOC '09.

[131]  S. Zucker,et al.  Accelerated dense random projections , 2009 .

[132]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[133]  David P. Woodruff,et al.  Numerical linear algebra in the streaming model , 2009, STOC '09.

[134]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[135]  D. Needell Randomized Kaczmarz solver for noisy linear systems , 2009, 0902.0958.

[136]  Alex Gittens,et al.  Error Bounds for Random Matrix Approximation Schemes , 2009, 0911.4108.

[137]  Christos Boutsidis,et al.  An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[138]  Malik Magdon-Ismail,et al.  On selecting a maximum volume sub-matrix of a matrix and related problems , 2009, Theor. Comput. Sci..

[139]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[140]  Luis Rademacher,et al.  Efficient Volume Sampling for Row/Column Subset Selection , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[141]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[142]  Nathan Halko,et al.  An Algorithm for the Principal Component Analysis of Large Data Sets , 2010, SIAM J. Sci. Comput..

[143]  C. Chui,et al.  Article in Press Applied and Computational Harmonic Analysis a Randomized Algorithm for the Decomposition of Matrices , 2022 .

[144]  Joel A. Tropp,et al.  Improved Analysis of the subsampled Randomized Hadamard Transform , 2010, Adv. Data Sci. Adapt. Anal..

[145]  Vladimir Rokhlin,et al.  Randomized approximate nearest neighbors algorithm , 2011, Proceedings of the National Academy of Sciences.

[146]  S. Muthukrishnan,et al.  Faster least squares approximation , 2007, Numerische Mathematik.

[147]  Amit Singer,et al.  Dense Fast Random Projections and Lean Walsh Transforms , 2008, APPROX-RANDOM.

[148]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[149]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..