Efficient Volume Sampling for Row/Column Subset Selection

We give efficient algorithms for volume sampling, i.e., for picking $k$-subsets of the rows of any given matrix with probabilities proportional to the squared volumes of the simplices defined by them and the origin (or the squared volumes of the parallelepipeds defined by these subsets of rows). %In other words, we can efficiently sample $k$-subsets of $[m]$ with probabilities proportional to the corresponding $k$ by $k$ principal minors of any given $m$ by $m$ positive semi definite matrix. This solves an open problem from the monograph on spectral algorithms by Kannan and Vempala (see Section $7.4$ of \cite{KV}, also implicit in \cite{BDM, DRVW}). Our first algorithm for volume sampling $k$-subsets of rows from an $m$-by-$n$ matrix runs in $O(kmn^\omega \log n)$ arithmetic operations (where $\omega$ is the exponent of matrix multiplication) and a second variant of it for $(1+\eps)$-approximate volume sampling runs in $O(mn \log m \cdot k^{2}/\eps^{2} + m \log^{\omega} m \cdot k^{2\omega+1}/\eps^{2\omega} \cdot \log(k \eps^{-1} \log m))$ arithmetic operations, which is almost linear in the size of the input (i.e., the number of entries) for small $k$. Our efficient volume sampling algorithms imply the following results for low-rank matrix approximation: (1) Given $A \in \reals^{m \times n}$, in $O(kmn^{\omega} \log n)$ arithmetic operations we can find $k$ of its rows such that projecting onto their span gives a $\sqrt{k+1}$-approximation to the matrix of rank $k$ closest to $A$ under the Frobenius norm. This improves the $O(k \sqrt{\log k})$-approximation of Boutsidis, Drineas and Mahoney \cite{BDM} and matches the lower bound shown in \cite{DRVW}. The method of conditional expectations gives a \emph{deterministic} algorithm with the same complexity. The running time can be improved to $O(mn \log m \cdot k^{2}/\eps^{2} + m \log^{\omega} m \cdot k^{2\omega+1}/\eps^{2\omega} \cdot \log(k \eps^{-1} \log m))$ at the cost of losing an extra $(1+\eps)$ in the approximation factor. (2) The same rows and projection as in the previous point give a $\sqrt{(k+1)(n-k)}$-approximation to the matrix of rank $k$ closest to $A$ under the spectral norm. In this paper, we show an almost matching lower bound of $\sqrt{n}$, even for $k=1$.

[1]  Avner Magen,et al.  Near Optimal Dimensionality Reductions That Preserve Volumes , 2008, APPROX-RANDOM.

[2]  Claude-Pierre Jeannerod,et al.  Essentially optimal computation of the inverse of generic polynomial matrices , 2005, J. Complex..

[3]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[4]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[5]  Volker Strassen,et al.  Algebraic Complexity Theory , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[6]  GuMing,et al.  Efficient algorithms for computing a strong rank-revealing QR factorization , 1996 .

[7]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[8]  Santosh S. Vempala,et al.  Spectral Algorithms , 2009, Found. Trends Theor. Comput. Sci..

[9]  Severnyi Kavkaz Pseudo-Skeleton Approximations by Matrices of Maximal Volume , 2022 .

[10]  S. Goreinov,et al.  The maximum-volume concept in approximation by low-rank matrices , 2001 .

[11]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[12]  Michael Clausen,et al.  Algebraic complexity theory , 1997, Grundlehren der mathematischen Wissenschaften.

[13]  S. Goreinov,et al.  Pseudo-skeleton approximations by matrices of maximal volume , 1997 .

[14]  C. Pan On the existence and computation of rank-revealing LU factorizations , 2000 .

[15]  Christos Boutsidis,et al.  An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[16]  Y. Peres,et al.  Determinantal Processes and Independence , 2005, math/0503110.

[17]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[18]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[19]  Malik Magdon-Ismail,et al.  On selecting a maximum volume sub-matrix of a matrix and related problems , 2009, Theor. Comput. Sci..

[20]  Santosh S. Vempala,et al.  Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.

[21]  R. Lyons Determinantal probability measures , 2002, math/0204325.

[22]  Malik Magdon-Ismail,et al.  Exponential Inapproximability of Selecting a Maximum Volume Sub-matrix , 2011, Algorithmica.