Revisiting Marginal Regression

The lasso has become an important practical tool for high dimensional regression as well as the object of intense theoretical investigation. But despite the availability of efficient algorithms, the lasso remains computationally demanding in regression problems where the number of variables vastly exceeds the number of data points. A much older method, marginal regression, largely displaced by the lasso, offers a promising alternative in this case. Computation for marginal regression is practical even when the dimension is very high. In this paper, we study the relative performance of the lasso and marginal regression for regression problems in three different regimes: (a) exact reconstruction in the noise-free and noisy cases when design and coefficients are fixed, (b) exact reconstruction in the noise-free case when the design is fixed but the coefficients are random, and (c) reconstruction in the noisy case where performance is measured by the number of coefficients whose sign is incorrect. In the first regime, we compare the conditions for exact reconstruction of the two procedures, find examples where each procedure succeeds while the other fails, and characterize the advantages and disadvantages of each. In the second regime, we derive conditions under which marginal regression will provide exact reconstruction with high probability. And in the third regime, we derive rates of convergence for the procedures and offer a new partitioning of the ``phase diagram,'' that shows when exact or Hamming reconstruction is effective.

[1]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[4]  Jianqing Fan,et al.  Variable Selection via Penalized Likelihood , 1999 .

[5]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[6]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[7]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  J. Robins,et al.  Uniform consistency in causal inference , 2003 .

[9]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[10]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[11]  Jean-Jacques Fuchs,et al.  Recovery of exact sparse representations in the presence of noise , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[13]  D. Donoho,et al.  Neighborliness of randomly projected simplices in high dimensions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[14]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[15]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[16]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[17]  Martin J. Wainwright,et al.  Sharp thresholds for high-dimensional and noisy recovery of sparsity , 2006, ArXiv.

[18]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[19]  David L. Donoho,et al.  High-Dimensional Centrally Symmetric Polytopes with Neighborliness Proportional to Dimension , 2006, Discret. Comput. Geom..

[20]  N. Meinshausen,et al.  Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses , 2005, math/0501289.

[21]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[22]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[23]  Jiashun Jin,et al.  Estimation and Confidence Sets for Sparse Normal Mixtures , 2006, math/0612623.

[24]  C. Robert Discussion of "Sure independence screening for ultra-high dimensional feature space" by Fan and Lv. , 2008 .

[25]  D. Donoho,et al.  Higher criticism thresholding: Optimal feature selection when useful features are rare and weak , 2008, Proceedings of the National Academy of Sciences.

[26]  Jiashun Jin,et al.  Feature selection by higher criticism thresholding achieves the optimal phase diagram , 2008, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[27]  Jiashun Jin,et al.  Impossibility of successful classification when useful features are rare and weak , 2009, Proceedings of the National Academy of Sciences.

[28]  M. Maathuis,et al.  Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm , 2009, 0906.3204.

[29]  P. Hall,et al.  Innovated Higher Criticism for Detecting Sparse Signals in Correlated Noise , 2009, 0902.3837.

[30]  Lie Wang,et al.  Shifting Inequality and Recovery of Sparse Signals , 2010, IEEE Transactions on Signal Processing.