Reduced-Rank Regression with Operator Norm Error

A common data analysis task is the reduced-rank regression problem: $$\min_{\textrm{rank-}k \ X} \|AX-B\|,$$ where $A \in \mathbb{R}^{n \times c}$ and $B \in \mathbb{R}^{n \times d}$ are given large matrices and $\|\cdot\|$ is some norm. Here the unknown matrix $X \in \mathbb{R}^{c \times d}$ is constrained to be of rank $k$ as it results in a significant parameter reduction of the solution when $c$ and $d$ are large. In the case of Frobenius norm error, there is a standard closed form solution to this problem and a fast algorithm to find a $(1+\varepsilon)$-approximate solution. However, for the important case of operator norm error, no closed form solution is known and the fastest known algorithms take singular value decomposition time. We give the first randomized algorithms for this problem running in time $$(\text{nnz}{(A)} + \text{nnz}{(B)} + c^2) \cdot k/\varepsilon^{1.5} + (n+d)k^2/\epsilon + c^{\omega},$$ up to a polylogarithmic factor involving condition numbers, matrix dimensions, and dependence on $1/\varepsilon$. Here $\text{nnz}{(M)}$ denotes the number of non-zero entries of a matrix $M$, and $\omega$ is the exponent of matrix multiplication. As both (1) spectral low rank approximation ($A = B$) and (2) linear system solving ($m = n$ and $d = 1$) are special cases, our time cannot be improved by more than a $1/\varepsilon$ factor (up to polylogarithmic factors) without a major breakthrough in linear algebra. Interestingly, known techniques for low rank approximation, such as alternating minimization or sketch-and-solve, provably fail for this problem. Instead, our algorithm uses an existential characterization of a solution, together with Krylov methods, low degree polynomial approximation, and sketching-based preconditioning.

[1]  Aaron Sidford,et al.  Stability of the Lanczos Method for Matrix Function Approximation , 2017, SODA.

[2]  Pierre Legendre,et al.  DISTANCE‐BASED REDUNDANCY ANALYSIS: TESTING MULTISPECIES RESPONSES IN MULTIFACTORIAL ECOLOGICAL EXPERIMENTS , 1999 .

[3]  Mark Tygert,et al.  An implementation of a randomized algorithm for principal component analysis , 2014, ArXiv.

[4]  Cameron Musco,et al.  Randomized Block Krylov Methods for Stronger and Faster Approximate Singular Value Decomposition , 2015, NIPS.

[5]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2012, STOC '13.

[6]  Huy L. Nguyen,et al.  OSNAP: Faster Numerical Linear Algebra Algorithms via Sparser Subspace Embeddings , 2012, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[7]  Christos Boutsidis,et al.  Topics in Matrix Sampling Algorithms , 2011, ArXiv.

[8]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[9]  C.J.F. ter Braak,et al.  Biplots in Reduced-Rank Regression , 1994 .

[10]  G. Reinsel,et al.  Multivariate Reduced-Rank Regression: Theory and Applications , 1998 .

[11]  Max Simchowitz,et al.  The gradient complexity of linear regression , 2020, COLT.

[12]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[13]  Philip Maher,et al.  Some norm inequalities concerning generalized inverses, 2 , 1992 .

[14]  M. Rudelson,et al.  Non-asymptotic theory of random matrices: extreme singular values , 2010, 1003.2990.

[15]  A. Rantzer,et al.  On a generalized matrix approximation problem in the spectral norm , 2012 .

[16]  Moritz Hardt,et al.  The Noisy Power Method: A Meta Algorithm with Applications , 2013, NIPS.

[17]  Zohar S. Karnin,et al.  Online {PCA} with Spectral Bounds , 2015 .

[18]  Nisheeth K. Vishnoi,et al.  Faster Algorithms via Approximation Theory , 2014, Found. Trends Theor. Comput. Sci..

[19]  Aaron Roth,et al.  Beyond worst-case analysis in private singular vector computation , 2012, STOC '13.

[20]  Ming Gu,et al.  Subspace Iteration Randomization and Singular Value Problems , 2014, SIAM J. Sci. Comput..

[21]  Max Simchowitz,et al.  Tight query complexity lower bounds for PCA via finite sample deformed wigner law , 2018, STOC.

[22]  David P. Woodru Sketching as a Tool for Numerical Linear Algebra , 2014 .

[23]  Terence Tao,et al.  The condition number of a randomly perturbed matrix , 2007, STOC '07.

[24]  Michael W. Mahoney,et al.  Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression , 2012, STOC '13.

[25]  Y. She,et al.  Robust reduced-rank regression , 2015, Biometrika.

[26]  M. Rudelson,et al.  The smallest singular value of a random rectangular matrix , 2008, 0802.3956.

[27]  Yin Tat Lee,et al.  An improved cutting plane method for convex optimization, convex-concave games, and its applications , 2020, STOC.

[28]  Anatoli Torokhti,et al.  Generalized Rank-Constrained Matrix Approximations , 2007, SIAM J. Matrix Anal. Appl..

[29]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2013, STOC '13.

[30]  Maria-Florina Balcan,et al.  An Improved Gap-Dependency Analysis of the Noisy Power Method , 2016, COLT.