Sparse High-Dimensional Linear Regression. Algorithmic Barriers and a Local Search Algorithm

We consider a sparse high dimensional regression model where the goal is to recover a $k$-sparse unknown vector $\beta^*$ from $n$ noisy linear observations of the form $Y=X\beta^*+W \in \mathbb{R}^n$ where $X \in \mathbb{R}^{n \times p}$ has iid $N(0,1)$ entries and $W \in \mathbb{R}^n$ has iid $N(0,\sigma^2)$ entries. Under certain assumptions on the parameters, an intriguing assymptotic gap appears between the minimum value of $n$, call it $n^*$, for which the recovery is information theoretically possible, and the minimum value of $n$, call it $n_{\mathrm{alg}}$, for which an efficient algorithm is known to provably recover $\beta^*$. In \cite{gamarnikzadik} it was conjectured that the gap is not artificial, in the sense that for sample sizes $n \in [n^*,n_{\mathrm{alg}}]$ the problem is algorithmically hard. We support this conjecture in two ways. Firstly, we show that the optimal solution of the LASSO provably fails to $\ell_2$-stably recover the unknown vector $\beta^*$ when $n \in [n^*,c n_{\mathrm{alg}}]$, for some sufficiently small constant $c>0$. Secondly, we establish that $n_{\mathrm{alg}}$, up to a multiplicative constant factor, is a phase transition point for the appearance of a certain Overlap Gap Property (OGP) over the space of $k$-sparse vectors. The presence of such an Overlap Gap Property phase transition, which originates in statistical physics, is known to provide evidence of an algorithmic hardness. Finally we show that if $n>C n_{\mathrm{alg}}$ for some large enough constant $C>0$, a very simple algorithm based on a local search improvement rule is able both to $\ell_2$-stably recover the unknown vector $\beta^*$ and to infer correctly its support, adding it to the list of provably successful algorithms for the high dimensional linear regression problem.

[1]  Guangjie Han,et al.  Consensus-based sparse signal reconstruction algorithm for wireless sensor networks , 2016, Int. J. Distributed Sens. Networks.

[2]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[3]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[4]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[5]  Lie Wang,et al.  Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise , 2011, IEEE Transactions on Information Theory.

[6]  Kamiar Rahnama Rad Nearly Sharp Sufficient Conditions on Exact Sparsity Pattern Recovery , 2009, IEEE Transactions on Information Theory.

[7]  Andrew R. Barron,et al.  Fast Sparse Superposition Codes Have Near Exponential Error Probability for $R<{\cal C}$ , 2014, IEEE Transactions on Information Theory.

[8]  Sara van de Geer,et al.  Confidence sets in sparse regression , 2012, 1209.1508.

[9]  David Gamarnik,et al.  Finding a large submatrix of a Gaussian random matrix , 2016, The Annals of Statistics.

[10]  Amin Coja-Oghlan,et al.  On independent sets in random graphs , 2010, SODA '11.

[11]  Martin J. Wainwright,et al.  Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting , 2009, IEEE Trans. Inf. Theory.

[12]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[13]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[14]  F. T. Wright,et al.  A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables , 1971 .

[15]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[16]  Bálint Virág,et al.  Local algorithms for independent sets are half-optimal , 2014, ArXiv.

[17]  David L. Donoho,et al.  Counting the Faces of Randomly-Projected Hypercubes and Orthants, with Applications , 2008, Discret. Comput. Geom..

[18]  Mohamed-Slim Alouini,et al.  The BOX-LASSO with application to GSSK modulation in massive MIMO systems , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[19]  Florent Krzakala,et al.  Replica analysis and approximate message passing decoder for superposition codes , 2014, 2014 IEEE International Symposium on Information Theory.

[20]  D. Donoho,et al.  Counting faces of randomly-projected polytopes when the projection radically lowers dimension , 2006, math/0607364.

[21]  Michele Zorzi,et al.  Sensing, Compression, and Recovery for WSNs: Sparse Signal Modeling and Monitoring Framework , 2012, IEEE Transactions on Wireless Communications.

[22]  Amin Coja-Oghlan,et al.  On the solution‐space geometry of random constraint satisfaction problems , 2011, Random Struct. Algorithms.

[23]  Ramji Venkataramanan,et al.  Capacity-Achieving Sparse Superposition Codes via Approximate Message Passing Decoding , 2015, IEEE Transactions on Information Theory.

[24]  E. Candès,et al.  Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism , 2010, 1007.1434.

[25]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[26]  Amin Coja-Oghlan,et al.  Algorithmic Barriers from Phase Transitions , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[27]  Babak Hassibi,et al.  Recovering Sparse Signals Using Sparse Measurement Matrices in Compressed DNA Microarrays , 2008, IEEE Journal of Selected Topics in Signal Processing.

[28]  R. DeVore,et al.  A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[29]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[30]  David Gamarnik,et al.  High Dimensional Regression with Binary Coefficients. Estimating Squared Error and a Phase Transtition , 2017, COLT.

[31]  M. Lustig,et al.  Compressed Sensing MRI , 2008, IEEE Signal Processing Magazine.

[32]  E. George The Variable Selection Problem , 2000 .

[33]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[34]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[35]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[36]  Galen Reeves,et al.  Approximate Sparsity Pattern Recovery: Information-Theoretic Lower Bounds , 2010, IEEE Transactions on Information Theory.

[37]  James B. Brown,et al.  An overview of recent developments in genomics and associated statistical methods , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[38]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[39]  D. Gamarnik,et al.  Limits of local algorithms over sparse random graphs , 2017 .

[40]  Andrea Montanari,et al.  Reconstruction and Clustering in Random Constraint Satisfaction Problems , 2011, SIAM J. Discret. Math..

[41]  Martin J. Wainwright,et al.  Information-Theoretic Limits on Sparse Signal Recovery: Dense versus Sparse Measurement Matrices , 2008, IEEE Transactions on Information Theory.

[42]  Christos Thrampoulidis,et al.  The squared-error of generalized LASSO: A precise analysis , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[43]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[44]  T. Cai,et al.  Accuracy assessment for high-dimensional linear regression , 2016, The Annals of Statistics.

[45]  Ping Zhang Model Selection Via Multifold Cross Validation , 1993 .

[46]  Lucas Janson,et al.  EigenPrism: inference for high dimensional signal‐to‐noise ratios , 2015, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[47]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.