High Dimensional Regression with Binary Coefficients. Estimating Squared Error and a Phase Transtition

We consider a sparse linear regression model Y=X\beta^{*}+W where X has a Gaussian entries, W is the noise vector with mean zero Gaussian entries, and \beta^{*} is a binary vector with support size (sparsity) k. Using a novel conditional second moment method we obtain a tight up to a multiplicative constant approximation of the optimal squared error \min_{\beta}\|Y-X\beta\|_{2}, where the minimization is over all k-sparse binary vectors \beta. The approximation reveals interesting structural properties of the underlying regression problem. In particular, a) We establish that n^*=2k\log p/\log (2k/\sigma^{2}+1) is a phase transition point with the following "all-or-nothing" property. When n exceeds n^{*}, (2k)^{-1}\|\beta_{2}-\beta^*\|_0\approx 0, and when n is below n^{*}, (2k)^{-1}\|\beta_{2}-\beta^*\|_0\approx 1, where \beta_2 is the optimal solution achieving the smallest squared error. With this we prove that n^{*} is the asymptotic threshold for recovering \beta^* information theoretically. b) We compute the squared error for an intermediate problem \min_{\beta}\|Y-X\beta\|_{2} where minimization is restricted to vectors \beta with \|\beta-\beta^{*}\|_0=2k \zeta, for \zeta\in [0,1]. We show that a lower bound part \Gamma(\zeta) of the estimate, which corresponds to the estimate based on the first moment method, undergoes a phase transition at three different thresholds, namely n_{\text{inf,1}}=\sigma^2\log p, which is information theoretic bound for recovering \beta^* when k=1 and \sigma is large, then at n^{*} and finally at n_{\text{LASSO/CS}}. c) We establish a certain Overlap Gap Property (OGP) on the space of all binary vectors \beta when n\le ck\log p for sufficiently small constant c. We conjecture that OGP is the source of algorithmic hardness of solving the minimization problem \min_{\beta}\|Y-X\beta\|_{2} in the regime n

[1]  David L. Donoho,et al.  Counting the Faces of Randomly-Projected Hypercubes and Orthants, with Applications , 2008, Discret. Comput. Geom..

[2]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[3]  Sara van de Geer,et al.  Confidence sets in sparse regression , 2012, 1209.1508.

[4]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[5]  Galen Reeves,et al.  Approximate Sparsity Pattern Recovery: Information-Theoretic Lower Bounds , 2010, IEEE Transactions on Information Theory.

[6]  Martin J. Wainwright,et al.  Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting , 2009, IEEE Trans. Inf. Theory.

[7]  Madhu Sudan,et al.  Performance of Sequential Local Algorithms for the Random NAE-K-SAT Problem , 2017, SIAM J. Comput..

[8]  慧 廣瀬 A Mathematical Introduction to Compressive Sensing , 2015 .

[9]  Madhu Sudan,et al.  Performance of the Survey Propagation-guided decimation algorithm for the random NAE-K-SAT problem , 2014, ArXiv.

[10]  S. Rangan,et al.  Orthogonal Matching Pursuit from Noisy Measurements : A New Analysis ∗ , 2009 .

[11]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[12]  Amin Coja-Oghlan,et al.  On independent sets in random graphs , 2010, SODA '11.

[13]  T. Cai,et al.  Accuracy assessment for high-dimensional linear regression , 2016, The Annals of Statistics.

[14]  E. George The Variable Selection Problem , 2000 .

[15]  J. Boutros,et al.  Euclidean space lattice decoding for joint detection in CDMA systems , 1999, Proceedings of the 1999 IEEE Information Theory and Communications Workshop (Cat. No. 99EX253).

[16]  Jess Banks,et al.  Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[17]  Andrea Montanari,et al.  Reconstruction and Clustering in Random Constraint Satisfaction Problems , 2011, SIAM J. Discret. Math..

[18]  Martin J. Wainwright,et al.  Information-Theoretic Limits on Sparse Signal Recovery: Dense versus Sparse Measurement Matrices , 2008, IEEE Transactions on Information Theory.

[19]  Hao Ling,et al.  Joint time-frequency analysis for radar signal and image processing , 1999, IEEE Signal Process. Mag..

[20]  Amin Coja-Oghlan,et al.  Algorithmic Barriers from Phase Transitions , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[21]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[22]  James B. Brown,et al.  An overview of recent developments in genomics and associated statistical methods , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[23]  Xing Qiu,et al.  Detecting intergene correlation changes in microarray analysis: a new approach to gene selection , 2009, BMC Bioinformatics.

[24]  D. Donoho,et al.  Counting faces of randomly-projected polytopes when the projection radically lowers dimension , 2006, math/0607364.

[25]  David Gamarnik,et al.  High Dimensional Linear Regression using Lattice Basis Reduction , 2018, NeurIPS.

[26]  Ping Zhang Model Selection Via Multifold Cross Validation , 1993 .

[27]  Bálint Virág,et al.  Local algorithms for independent sets are half-optimal , 2014, ArXiv.

[28]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[29]  Bart P. G. Van Parys,et al.  Sparse high-dimensional regression: Exact scalable algorithms and phase transitions , 2017, The Annals of Statistics.

[30]  Babak Hassibi,et al.  On the expected complexity of integer least-squares problems , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[32]  T. Cai,et al.  Two-Sample Covariance Matrix Testing and Support Recovery in High-Dimensional and Sparse Settings , 2013 .

[33]  Madhu Sudan,et al.  Limits of local algorithms over sparse random graphs , 2013, ITCS.

[34]  Andrea Montanari,et al.  The dynamics of message passing on dense graphs, with applications to compressed sensing , 2010, ISIT.

[35]  Guangjie Han,et al.  Consensus-based sparse signal reconstruction algorithm for wireless sensor networks , 2016, Int. J. Distributed Sens. Networks.

[36]  Stephen P. Boyd,et al.  Integer parameter estimation in linear models with applications to GPS , 1998, IEEE Trans. Signal Process..

[37]  D. L. Donoho,et al.  Compressed sensing , 2006, IEEE Trans. Inf. Theory.

[38]  Lucas Janson,et al.  EigenPrism: inference for high dimensional signal‐to‐noise ratios , 2015, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[39]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[40]  Xing Qiu,et al.  A new gene selection procedure based on the covariance distance , 2010, Bioinform..

[41]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[42]  David Gamarnik,et al.  Finding a large submatrix of a Gaussian random matrix , 2016, The Annals of Statistics.

[43]  Volkan Cevher,et al.  Limits on Support Recovery With Probabilistic Models: An Information-Theoretic Framework , 2015, IEEE Transactions on Information Theory.

[44]  Arian Maleki,et al.  Does $\ell _{p}$ -Minimization Outperform $\ell _{1}$ -Minimization? , 2015, IEEE Transactions on Information Theory.

[45]  Zheng Bao,et al.  A new feature vector using selected bispectra for signal classification with application in radar target recognition , 2001, IEEE Trans. Signal Process..

[46]  Adam Shwartz,et al.  Large Deviations For Performance Analysis , 2019 .

[47]  M. Lustig,et al.  Compressed Sensing MRI , 2008, IEEE Signal Processing Magazine.

[48]  Galen Reeves,et al.  The Sampling Rate-Distortion Tradeoff for Sparsity Pattern Recovery in Compressed Sensing , 2010, IEEE Transactions on Information Theory.

[49]  Michael I. Jordan,et al.  Union support recovery in high-dimensional multivariate regression , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[50]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[51]  Andrea Montanari,et al.  The dynamics of message passing on dense graphs, with applications to compressed sensing , 2010, 2010 IEEE International Symposium on Information Theory.

[52]  Michele Zorzi,et al.  Sensing, Compression, and Recovery for WSNs: Sparse Signal Modeling and Monitoring Framework , 2012, IEEE Transactions on Wireless Communications.

[53]  Amin Coja-Oghlan,et al.  On the solution‐space geometry of random constraint satisfaction problems , 2011, Random Struct. Algorithms.