论文信息 - On the Power of Preconditioning in Sparse Linear Regression

On the Power of Preconditioning in Sparse Linear Regression

Sparse linear regression is a fundamental problem in high-dimensional statistics, but strikingly little is known about how to efficiently solve it without restrictive conditions on the design matrix. We consider the (correlated) random design setting, where the covariates are independently drawn from a multivariate Gaussian N(0,Σ), for some n×n positive semi-definite matrix Σ, and seek estimators ŵ minimizing (ŵ−w∗)T Σ(ŵ−w∗), where w∗ is the k-sparse ground truth. Information theoretically, one can achieve strong error bounds with only O(k log n) samples for arbitrary Σ and w∗; however, no efficient algorithms are known to match these guarantees even with o(n) samples, without further assumptions on Σ or w∗. Yet there is little evidence for this gap in the random design setting: computational lower bounds are only known for worst-case design matrices. To date, random-design instances (i.e. specific covariance matrices Σ) have only been proven hard against the Lasso program and variants. More precisely, these “hard” instances can often be solved by Lasso after a simple change-of-basis (i.e. preconditioning). In this work, we give both upper and lower bounds clarifying the power of preconditioning as a tool for solving sparse linear regression problems. On the one hand, we show that the preconditioned Lasso can solve a large class of sparse linear regression problems nearly optimally: it succeeds whenever the dependency structure of the covariates, in the sense of the Markov property, has low treewidth — even if Σ is highly ill-conditioned. This upper bound builds on ideas from the wavelet and signal processing literature. As a special case of this result, we give an algorithm for sparse linear regression with covariates from an autoregressive time series model, where we also show that the (usual) Lasso provably fails. On the other hand, we construct (for the first time) random-design instances which are provably hard even for an optimally preconditioned Lasso. In fact, we complete our treewidth classification by proving that for any treewidth-t graph, there exists a Gaussian Markov Random Field on this graph such that the preconditioned Lasso, with any choice of preconditioner, requires Ω(t) samples to recover O(log n)-sparse signals when covariates are drawn from this model. kelner@mit.edu. This work was supported in part by NSF Large CCF-1565235, NSF Medium CCF-1955217, and NSF TRIPODS 1740751. fkoehler@mit.edu. This work was supported in part by NSF CAREER Award CCF-1453261, NSF Large CCF1565235, A. Moitra’s ONR Young Investigator Award and E. Mossel’s Vannevar Bush Faculty Fellowship ONRN00014-20-1-2826. raghum@cs.ucla.edu. This work was supported in part by NSF CAREER Award CCF-1553605 and NSF Small CCF-2007682 drohatgi@mit.edu. This work was supported in part by NSF Large CCF-1565235, NSF Medium CCF-1955217, and the MIT UROP Office. ar X iv :2 10 6. 09 20 7v 1 [ cs .L G ] 1 7 Ju n 20 21