A Lava Attack on the Recovery of Sums of Dense and Sparse Signals

Common high-dimensional methods for prediction rely on having either a sparse signal model, a model in which most parameters are zero and there are a small number of non-zero parameters that are large in magnitude, or a dense signal model, a model with no large parameters and very many small non-zero parameters. We consider a generalization of these two basic models, termed here a “sparse + dense” model, in which the signal is given by the sum of a sparse signal and a dense signal. Such a structure poses problems for traditional sparse estimators, such as the lasso, and for traditional dense estimation methods, such as ridge estimation. We propose a new penalization-based method, called lava, which is computationally efficient. With suitable choices of penalty parameters, the proposed method strictly dominates both lasso and ridge. We derive analytic expressions for the finite-sample risk function of the lava estimator in the Gaussian sequence model. We also provide a deviation bound for the prediction risk in the Gaussian regression model with fixed design. In both cases, we provide Stein's unbiased estimator for lava's prediction risk. A simulation example compares the performance of lava to lasso, ridge, and elastic net in a regression example using data-dependent penalty parameters and illustrates lava's improved performance relative to these benchmarks.

[1]  J WainwrightMartin,et al.  Regularized M-estimators with nonconvexity , 2015 .

[2]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[3]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2010, 1009.5689.

[4]  Vidyashankar Sivakumar,et al.  Estimation with Norm Regularization , 2014, NIPS.

[5]  Pradeep Ravikumar,et al.  Dirty Statistical Models , 2013, NIPS.

[6]  A Lava Attack on the Recovery of Sums of Dense and Sparse Signals , 2015 .

[7]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[8]  Tengyuan Liang,et al.  Geometric Inference for General High-Dimensional Linear Inverse Problems , 2014, 1404.4408.

[9]  A. Belloni,et al.  Pivotal estimation via square-root Lasso in nonparametric regression , 2011, 1105.1475.

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  A. Tsybakov,et al.  Robust matrix completion , 2014, Probability Theory and Related Fields.

[12]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[13]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[14]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[15]  Jianqing Fan,et al.  Large covariance estimation by thresholding principal orthogonal complements , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[16]  Jalal M. Fadili,et al.  The degrees of freedom of the Lasso for general design matrix , 2011, 1111.1162.

[17]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[18]  Yin Chen,et al.  Fused sparsity and robust estimation for linear models with unknown variance , 2012, NIPS.

[19]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[20]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[21]  J. Friedman,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Response , 1993 .

[22]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[23]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[24]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[25]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[26]  Ali Jalali,et al.  A Dirty Model for Multiple Sparse Regression , 2011, IEEE Transactions on Information Theory.

[27]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[28]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[29]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..

[30]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[31]  R. Tibshirani,et al.  Degrees of freedom in lasso problems , 2011, 1111.0653.

[32]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[33]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[34]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[35]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[36]  Mary C. Meyer,et al.  ON THE DEGREES OF FREEDOM IN SHAPE-RESTRICTED REGRESSION , 2000 .

[37]  K. Hirano,et al.  Impossibility Results for Nondifferentiable Functionals , 2012 .

[38]  A. Tsybakov,et al.  SPADES AND MIXTURE MODELS , 2009, 0901.2044.

[39]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009, 1001.0188.

[40]  B. Efron The Estimation of Prediction Error , 2004 .

[41]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[42]  J. Neyman,et al.  INADMISSIBILITY OF THE USUAL ESTIMATOR FOR THE MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION , 2005 .

[43]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[44]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.