论文信息 - An improved approximation algorithm for the column subset selection problem

An improved approximation algorithm for the column subset selection problem

We consider the problem of selecting the "best" subset of exactly k columns from an m x n matrix A. In particular, we present and analyze a novel two-stage algorithm that runs in O(min{mn2, m2n}) time and returns as output an m x k matrix C consisting of exactly k columns of A. In the first stage (the randomized stage), the algorithm randomly selects O(k log k) columns according to a judiciously-chosen probability distribution that depends on information in the top-k right singular subspace of A. In the second stage (the deterministic stage), the algorithm applies a deterministic column-selection procedure to select and return exactly k columns from the set of columns selected in the first stage. Let C be the m x k matrix containing those k columns, let PC denote the projection matrix onto the span of those columns, and let Ak denote the "best" rank-k approximation to the matrix A as computed with the singular value decomposition. Then, we prove that [EQUATION] with probability at least 0.7. This spectral norm bound improves upon the best previously-existing result (of Gu and Eisenstat [21]) for the spectral norm version of this Column Subset Selection Problem. We also prove that [EQUATION] with the same probability. This Frobenius norm bound is only a factor of √k log k worse than the best previously existing existential result and is roughly O(√k!) better than the best previous algorithmic result (both of Deshpande et al. [11]) for the Frobenius norm version of this Column Subset Selection Problem.

[1] L. Foster. Rank and null space calculations using matrix decomposition without column interchanges , 1986 .

[2] W. Krzanowski. Selection of Variables to Preserve Multivariate Data Structure, Using Principal Components , 1987 .

[3] S. Chatterjee. Sensitivity analysis in linear regression , 1988 .

[4] Per Christian Hansen,et al. Some Applications of the Rank Revealing QR Factorization , 1992, SIAM J. Sci. Comput..

[5] C. Pan,et al. Rank-Revealing QR Factorizations and the Singular Value Decomposition , 1992 .

[6] P. Tang,et al. Bounds on Singular Values Revealed by QR Factorizations , 1999 .

[7] Per Christian Hansen,et al. Low-rank revealing QR factorizations , 1994, Numerical Linear Algebra with Applications.

[8] Ming Gu,et al. Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[9] Rajeev Motwani,et al. Randomized algorithms , 1996, CSUR.

[10] Christian H. Bischof,et al. Computing rank-revealing QR factorizations of dense matrices , 1998, TOMS.

[11] Christian H. Bischof,et al. Algorithm 782: codes for rank-revealing QR factorizations of dense matrices , 1998, TOMS.

[12] G. W. Stewart,et al. Four algorithms for the the efficient computation of truncated pivoted QR approximations to a sparse matrix , 1999, Numerische Mathematik.

[13] C. Pan. On the existence and computation of rank-revealing LU factorizations , 2000 .

[14] T. Chan. Rank Revealing OR Factorizations * , 2001 .

[15] Prabhakar Raghavan,et al. Competitive recommendation systems , 2002, STOC '02.

[16] Gérard Dreyfus,et al. Ranking a Random Feature for Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[17] I. Guyon,et al. Detecting stable clusters using principal component analysis. , 2003, Methods in molecular biology.

[18] Alan M. Frieze,et al. Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[19] Per Christian Hansen,et al. UTV Tools: Matlab templates for rank-revealing UTV decompositions , 1999, Numerical Algorithms.

[20] ShashuaAmnon,et al. Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach , 2005 .

[21] Kezhi Mao,et al. Identifying critical variables of principal components for unsupervised feature selection , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22] Santosh S. Vempala,et al. Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[23] V. Rokhlin,et al. A randomized algorithm for the approximation of matrices , 2006 .

[24] L. Foster,et al. COMPARISON OF RANK REVEALING ALGORITHMS APPLIED TO MATRICES WITH WELL DEFINED NUMERICAL RANKS , 2006 .

[25] S. Vempala,et al. Matrix approximation and projective clustering via volume sampling , 2006, ACM-SIAM Symposium on Discrete Algorithms.

[26] S. Muthukrishnan,et al. Subspace Sampling and Relative-Error Matrix Approximation: Column-Based Methods , 2006, APPROX-RANDOM.

[27] Petros Drineas,et al. Tensor-CUR decompositions for tensor-based data , 2006, KDD '06.

[28] Petros Drineas,et al. Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication , 2006, SIAM J. Comput..

[29] Santosh S. Vempala,et al. Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.

[30] Mark Rudelson,et al. Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[31] Jimeng Sun,et al. Less is More: Compact Matrix Decomposition for Large Sparse Graphs , 2007, SDM.

[32] Huan Liu,et al. Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[33] Gene H. Golub,et al. Numerical methods for solving linear least squares problems , 1965, Milestones in Matrix Computation.

[34] Michael W. Mahoney,et al. PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations , 2007, PLoS genetics.

[35] M. Magdon-Ismail,et al. Finding Maximum Volume Sub-matrices of a Matrix , 2007 .

[36] V. Rokhlin,et al. A fast randomized algorithm for the approximation of matrices ✩ , 2007 .

[37] Christos Boutsidis,et al. Unsupervised feature selection for principal components analysis , 2008, KDD.

[38] S. Muthukrishnan,et al. Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[39] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[40] S. Muthukrishnan,et al. Faster least squares approximation , 2007, Numerische Mathematik.

[41] J. A. Díaz-García,et al. SENSITIVITY ANALYSIS IN LINEAR REGRESSION , 2022 .