论文信息 - Provable deterministic leverage score sampling

Provable deterministic leverage score sampling

We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate". In this work, we provide a novel theoretical analysis of deterministic leverage score sampling. We show that such sampling can be provably as accurate as its randomized counterparts, if the leverage scores follow a moderately steep power-law decay. We support this power-law assumption by providing empirical evidence that such decay laws are abundant in real-world data sets. We then demonstrate empirically the performance of deterministic leverage score sampling, which many times matches or outperforms the state-of-the-art techniques.

Dimitris Papailiopoulos | Anastasios Kyrillidis | Christos Boutsidis

[1] Ilse C. F. Ipsen,et al. The Effect of Coherence on Sampling from Matrices with Orthonormal Columns, and Preconditioned Least Squares Problems , 2014, SIAM J. Matrix Anal. Appl..

[2] Philip S. Yu,et al. Colibri: fast mining of large static and dynamic graphs , 2008, KDD.

[3] Heng Tao Shen,et al. Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[4] E. Tyrtyshnikov. Mosaic-Skeleton approximations , 1996 .

[5] Jimeng Sun,et al. Less is More: Compact Matrix Decomposition for Large Sparse Graphs , 2007, SDM.

[6] Ilse C. F. Ipsen,et al. On Rank-Revealing Factorisations , 1994, SIAM J. Matrix Anal. Appl..

[7] G. W. Stewart,et al. Four algorithms for the the efficient computation of truncated pivoted QR approximations to a sparse matrix , 1999, Numerische Mathematik.

[8] Venkatesan Guruswami,et al. Optimal column-based low-rank matrix reconstruction , 2011, SODA.

[9] T. Chan. Rank Revealing OR Factorizations * , 2001 .

[10] Edo Liberty,et al. Simple and deterministic matrix sketching , 2012, KDD.

[11] Gene H. Golub,et al. Numerical methods for solving linear least squares problems , 1965, Milestones in Matrix Computation.

[12] Martin Brown,et al. Subset Selection Algorithms: Randomized vs. Deterministic , 2010 .

[13] Alan M. Frieze,et al. Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[14] Jeff M. Phillips,et al. Relative Errors for Deterministic Low-Rank Matrix Approximations , 2013, SODA.

[15] Ming Gu,et al. Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[16] Per Christian Hansen,et al. Low-rank revealing QR factorizations , 1994, Numer. Linear Algebra Appl..

[17] Luis Rademacher,et al. Efficient Volume Sampling for Row/Column Subset Selection , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[18] Mark Rudelson,et al. Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[19] C. Pan,et al. Rank-Revealing QR Factorizations and the Singular Value Decomposition , 1992 .

[20] Christos Boutsidis,et al. Unsupervised feature selection for principal components analysis , 2008, KDD.

[21] Christos Boutsidis,et al. Near Optimal Column-Based Matrix Reconstruction , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[22] Christos Boutsidis,et al. Faster Subset Selection for Matrices and Applications , 2011, SIAM J. Matrix Anal. Appl..

[23] Michael W. Mahoney,et al. PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations , 2007, PLoS genetics.

[24] GuMing,et al. Efficient algorithms for computing a strong rank-revealing QR factorization , 1996 .

[25] Ian T. Jolliffe,et al. Discarding Variables in a Principal Component Analysis. I: Artificial Data , 1972 .

[26] Nikhil Srivastava,et al. Graph sparsification by effective resistances , 2008, SIAM J. Comput..

[27] I. Jolliffe. Discarding Variables in a Principal Component Analysis. Ii: Real Data , 1973 .

[28] Christos Boutsidis,et al. Optimal CUR matrix decompositions , 2014, STOC.

[29] Christos Boutsidis,et al. An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[30] I. Jolliffe. Principal Component Analysis , 2002 .

[31] C. Pan. On the existence and computation of rank-revealing LU factorizations , 2000 .

[33] Petros Drineas,et al. CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[34] Jérôme Kunegis,et al. KONECT: the Koblenz network collection , 2013, WWW.

[35] Christian H. Bischof,et al. Computing rank-revealing QR factorizations of dense matrices , 1998, TOMS.

[36] Michael W. Mahoney,et al. Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[37] Anastasios Zouzias,et al. A Matrix Hyperbolic Cosine Algorithm and Applications , 2011, ICALP.

[38] S. Muthukrishnan,et al. Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[39] Willem H. Haemers,et al. Spectra of Graphs , 2011 .