Column subset selection via sparse approximation of SVD

Given a real matrix A∈Rm×n of rank r, and an integer k<r, the sum of the outer products of top k singular vectors scaled by the corresponding singular values provide the best rank-k approximation Ak to A. When the columns of A have specific meaning, it might be desirable to find good approximations to Ak which use a small number of columns of A. This paper provides a simple greedy algorithm for this problem in Frobenius norm, with guarantees on the performance and the number of columns chosen. The algorithm selects c columns from A with c=O(klogkϵ2η2(A)) such that ‖A−ΠCA‖F≤(1+ϵ)‖A−Ak‖F, where C is the matrix composed of the c columns, ΠC is the matrix projecting the columns of A onto the space spanned by C and η(A) is a measure related to the coherence in the normalized columns of A. The algorithm is quite intuitive and is obtained by combining a greedy solution to the generalization of the well known sparse approximation problem and an existence result on the possibility of sparse approximation. We provide empirical results on various specially constructed matrices comparing our algorithm with the previous deterministic approaches based on QR factorizations and a recently proposed randomized algorithm. The results indicate that in practice, the performance of the algorithm can be significantly better than the bounds suggest.

[1]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[2]  C GilbertAnna,et al.  Algorithms for simultaneous sparse approximation. Part II , 2006 .

[3]  Ilse C. F. Ipsen,et al.  On Rank-Revealing Factorisations , 1994, SIAM J. Matrix Anal. Appl..

[4]  T. Chan Rank revealing QR factorizations , 1987 .

[5]  P. Tang,et al.  Bounds on Singular Values Revealed by QR Factorizations , 1999 .

[6]  Vladimir N. Temlyakov,et al.  Vector greedy algorithms , 2003, J. Complex..

[7]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[8]  Kasturi R. Varadarajan,et al.  Sampling-based dimension reduction for subspace approximation , 2007, STOC '07.

[9]  Alan M. Frieze,et al.  Fast Monte-Carlo algorithms for finding low-rank approximations , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[10]  C. Pan,et al.  Rank-Revealing QR Factorizations and the Singular Value Decomposition , 1992 .

[11]  Shang-Hua Teng,et al.  Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.

[12]  M. Rudelson Random Vectors in the Isotropic Position , 1996, math/9608208.

[13]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[14]  Santosh S. Vempala,et al.  Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.

[15]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[16]  W. Kahan Numerical Linear Algebra , 1966, Canadian Mathematical Bulletin.

[17]  Christos Boutsidis,et al.  Unsupervised feature selection for principal components analysis , 2008, KDD.

[18]  Joel A. Tropp,et al.  Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit , 2006, Signal Process..

[19]  Jie Chen,et al.  Theoretical Results on Sparse Representations of Multiple-Measurement Vectors , 2006, IEEE Transactions on Signal Processing.

[20]  Kasturi R. Varadarajan,et al.  Efficient Subspace Approximation Algorithms , 2007, Discrete & Computational Geometry.

[21]  G. Golub,et al.  Linear least squares solutions by householder transformations , 1965 .

[22]  Dimitris Achlioptas,et al.  Fast computation of low-rank matrix approximations , 2007, JACM.

[23]  Michael W. Berry,et al.  SVDPACKC (Version 1.0) User''s Guide , 1993 .

[24]  Alan M. Frieze,et al.  Clustering in large graphs and matrices , 1999, SODA '99.

[25]  Mark Rudelson,et al.  Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[26]  Gene H. Golub,et al.  Matrix computations , 1983 .

[27]  S. Muthukrishnan,et al.  Subspace Sampling and Relative-Error Matrix Approximation: Column-Based Methods , 2006, APPROX-RANDOM.

[28]  F. Hoog,et al.  Subset selection for matrices , 2007 .

[29]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix , 2006, SIAM J. Comput..

[30]  Per Christian Hansen,et al.  Low-rank revealing QR factorizations , 1994, Numerical Linear Algebra with Applications.

[31]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[32]  Venkatesan Guruswami,et al.  Optimal column-based low-rank matrix reconstruction , 2011, SODA.

[33]  S. Schreiber,et al.  Vector algebra in the analysis of genome-wide expression data , 2002, Genome Biology.

[34]  Michael Elad,et al.  From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images , 2009, SIAM Rev..

[35]  Christos Boutsidis,et al.  An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[36]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..