Subset Selection Algorithms: Randomized vs. Deterministic

Abstract. Subset selection is a method for selecting a subset of columns from a real matrix, so that the subset represents the entire matrix well and is far from being rank deficient. We begin by extending a deterministic subset selection algorithm to matrices that have more columns than rows. Then we investigate a two-stage subset selection algorithm that utilizes a randomized stage to pick a smaller number of candidate columns, which are forwarded for to the deterministic stage for subset selection. We perform extensive numerical experiments to compare the accuracy of this algorithm with the best known deterministic algorithm. We also introduce an iterative algorithm that systematically determines the number of candidate columns picked in the randomized stage, and we provide a recommendation for a specific value. Motivated by our experimental results, we propose a new two stage deterministic algorithm for subset selection. In our numerical experiments, this new algorithm appears to be as accurate as the best deterministic algorithm, but it is faster, and it is also easier to implement than the randomized algorithm.

[1]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[2]  Christos Boutsidis,et al.  An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[3]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[4]  G. Stewart,et al.  Rank degeneracy and least squares problems , 1976 .

[5]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[6]  G. Golub,et al.  Linear least squares solutions by householder transformations , 1965 .

[7]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[8]  Per-Gunnar Martinsson,et al.  Randomized algorithms for the low-rank approximation of matrices , 2007, Proceedings of the National Academy of Sciences.

[9]  Ilse C. F. Ipsen,et al.  On Rank-Revealing Factorisations , 1994, SIAM J. Matrix Anal. Appl..

[10]  Jennifer H Barrett,et al.  Strategies for selecting subsets of single-nucleotide polymorphisms to genotype in association studies , 2005, BMC Genetics.

[11]  C. Pan On the existence and computation of rank-revealing LU factorizations , 2000 .

[12]  Gene H. Golub,et al.  Numerical methods for solving linear least squares problems , 1965, Milestones in Matrix Computation.

[13]  Thomas Kaiser,et al.  Antenna Subset Selection for Cyclic Prefix Assisted MIMO Wireless Communications over Frequency Selective Channels , 2008, EURASIP J. Adv. Signal Process..

[14]  T. Chan Rank Revealing OR Factorizations * , 2001 .

[15]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..