Column subset selection, matrix factorization, and eigenvalue optimization

Given a fixed matrix, the problem of column subset selection requests a column submatrix that has favorable spectral properties. Most research from the algorithms and numerical linear algebra communities focuses on a variant called rank-revealing QR, which seeks a well-conditioned collection of columns that spans the (numerical) range of the matrix. The functional analysis literature contains another strand of work on column selection whose algorithmic implications have not been explored. In particular, a celebrated result of Bourgain and Tzafriri demonstrates that each matrix with normalized columns contains a large column submatrix that is exceptionally well conditioned. Unfortunately, standard proofs of this result cannot be regarded as algorithmic. This paper presents a randomized, polynomial-time algorithm that produces the submatrix promised by Bourgain and Tzafriri. The method involves random sampling of columns, followed by a matrix factorization that exposes the well-conditioned subset of columns. This factorization, which is due to Grothendieck, is regarded as a central tool in modern functional analysis. The primary novelty in this work is an algorithm, based on eigenvalue minimization, for constructing the Grothendieck factorization. These ideas also result in an approximation algorithm for the (∞, 1) norm of a matrix, which is generally NP-hard to compute exactly. As an added bonus, this work reveals a surprising connection between matrix factorization and the famous maxcut semidefinite program.

[1]  Adrian S. Lewis,et al.  Convex Analysis on the Hermitian Matrices , 1996, SIAM J. Optim..

[2]  J. Bourgain,et al.  Invertibility of ‘large’ submatrices with applications to the geometry of Banach spaces and harmonic analysis , 1987 .

[3]  Alexander Shapiro,et al.  On Eigenvalue Optimization , 1995, SIAM J. Optim..

[4]  V. Koltchinskii,et al.  High Dimensional Probability , 2006, math/0612726.

[5]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[6]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[7]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[8]  Mark Rudelson,et al.  Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[9]  Satyen Kale Efficient algorithms using the multiplicative weights update method , 2007 .

[10]  Noga Alon,et al.  Approximating the cut-norm via Grothendieck's inequality , 2004, STOC '04.

[11]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[12]  J. Rohn Computing the norm ∥A∥∞,1 is NP-hard , 2000 .

[13]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[14]  G. Pisier Factorization of Linear Operators and Geometry of Banach Spaces , 1986 .

[15]  Golub Gene H. Et.Al Matrix Computations, 3rd Edition , 2007 .

[16]  Farid Alizadeh,et al.  Interior Point Methods in Semidefinite Programming with Applications to Combinatorial Optimization , 1995, SIAM J. Optim..

[17]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[18]  J. Tropp On the Linear Independence of Spikes and Sines , 2007, 0709.0517.

[19]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[20]  R. Vershynin John's decompositions: Selecting a large part , 1999, math/9909110.

[21]  S. Szarek Spaces with large distance to l∞n and random matrices , 1990 .

[22]  J. Bourgain,et al.  On a problem of Kadison and Singer. , 1991 .