Finding Maximum Volume Sub-matrices of a Matrix

Given a matrix A ∈ Rm×n (n vectors in m dimensions), we consider the problem of selecting a submatrix (subset of the columns) with maximum volume. The motivation to study such a problem is that if A can be approximately reconstructed from a small number k of its columns (A has “numerical” rank k), then any set of k independent columns of A should suffice to reconstruct A. However, numerical stability results only if the chosen k have large volume. We thus define an appropriate algorithmic problem Max-Vol(k), which asks for the k columns with maximum volume. We show that Max-Vol is NP-hard, and in fact does not admit any PTAS. In particular, it is NP-hard to approximate Max-Vol within 2 √ 2 3 + ǫ. We study a natural greedy heuristic for Max-Vol and show that it has approximation ratio 2−O(k log . We show that our analysis of the greedy heuristic is tight to within a logarithmic factor in the exponent by giving an instance of Max-Vol for which the greedy heuristic is 2−Ω(k) from optimal. When A has unit norm columns, a related problem is to select the maximum number of vectors with a given volume (this pre-specified volume could be the volume required on grounds of numerical stability for the reconstruction). We show that if the optimal solution selects k columns, then greedy will select Ω( k log k ) columns, providing a log k-approximation.

[1]  G. Golub,et al.  Linear least squares solutions by householder transformations , 1965 .

[2]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[3]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[4]  László Lovász,et al.  Factoring polynomials with rational coefficients , 1982 .

[5]  Gene H. Golub,et al.  Matrix computations , 1983 .

[6]  F. Hoog,et al.  Subset selection for matrices , 2007 .

[7]  C. Pan,et al.  Rank-Revealing QR Factorizations and the Singular Value Decomposition , 1992 .

[8]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[9]  Alan M. Frieze,et al.  Fast Monte-Carlo algorithms for finding low-rank approximations , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[10]  C. Pan On the existence and computation of rank-revealing LU factorizations , 2000 .

[11]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[12]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[13]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES III: COMPUTING A COMPRESSED APPROXIMATE MATRIX DECOMPOSITION∗ , 2004 .

[14]  Santosh S. Vempala,et al.  Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.