Sign-constrained least squares estimation for high-dimensional regression

Many regularization schemes for high-dimensional regression have been put forward. Most require the choice of a tuning parameter, using model selection criteria or cross-validation schemes. We show that a simple non-negative or sign-constrained least squares is a very simple and effective regularization technique for a certain class of high-dimensional regression problems. The sign constraint has to be derived via prior knowledge or an initial estimator but no further tuning or cross-validation is necessary. The success depends on conditions that are easy to check in practice. A sufficient condition for our results is that most variables with the same sign constraint are positively correlated. For a sparse optimal predictor, a non-asymptotic bound on the L1-error of the regression coefficients is then proven. Without using any further regularization, the regression vector can be estimated consistently as long as \log(p) s/n -> 0 for n -> \infty, where s is the sparsity of the optimal regression vector, p the number of variables and n sample size. Network tomography is shown to be an application where the necessary conditions for success of non-negative least squares are naturally fulfilled and empirical results confirm the effectiveness of the sign constraint for sparse recovery.

[1]  Florentina Bunea,et al.  The Group Square-Root Lasso: Theoretical Properties and Fast Algorithms , 2013, IEEE Transactions on Information Theory.

[2]  Mark Berman,et al.  A comparison between subset selection and L1 regularisation with an application in spectroscopy , 2012 .

[3]  Matthias Hein,et al.  Non-negative least squares for high-dimensional linear models: consistency and sparse recovery without regularization , 2012, 1205.0953.

[4]  Matthias Hein,et al.  Sparse recovery by thresholded non-negative least squares , 2011, NIPS.

[5]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2010, 1009.5689.

[6]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[8]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[9]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[10]  Donghui Chen,et al.  Nonnegativity constraints in numerical analysis , 2009, The Birth of Numerical Analysis.

[11]  Christos Boutsidis,et al.  Random Projections for the Nonnegative Least-Squares Problem , 2008, ArXiv.

[12]  Michael Elad,et al.  On the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations , 2008, IEEE Transactions on Information Theory.

[13]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[14]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[15]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[16]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[17]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[18]  E. Candès,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[19]  Nicolai Meinshausen,et al.  Relaxed Lasso , 2007, Comput. Stat. Data Anal..

[20]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[21]  Stefania Bellavia,et al.  An interior point Newton‐like method for non‐negative least‐squares problems with degenerate solution , 2006, Numer. Linear Algebra Appl..

[22]  Vijayan N. Nair,et al.  Estimating Network Loss Rates Using Active Tomography , 2006 .

[23]  I. Dhillon,et al.  A New Projected Quasi-Newton Approach for the Nonnegative Least Squares Problem , 2006 .

[24]  G. Michailidis,et al.  Network delay tomography using flexicast experiments , 2006 .

[25]  Martin J. Wainwright,et al.  Sharp thresholds for high-dimensional and noisy recovery of sparsity , 2006, ArXiv.

[26]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[27]  Robert Nowak,et al.  Network Tomography: Recent Developments , 2004 .

[28]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[29]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[30]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[31]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[32]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[33]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[34]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[35]  I. Johnstone,et al.  Maximum Entropy and the Nearly Black Object , 1992 .

[36]  M. Waterman C1 . Least souares with nonnegative regression coefficients , 1977 .