A Path Algorithm for Constrained Estimation

Many least-square problems involve affine equality and inequality constraints. Although there are a variety of methods for solving such problems, most statisticians find constrained estimation challenging. The current article proposes a new path-following algorithm for quadratic programming that replaces hard constraints by what are called exact penalties. Similar penalties arise in l 1 regularization in model selection. In the regularization setting, penalties encapsulate prior knowledge, and penalized parameter estimates represent a trade-off between the observed data and the prior knowledge. Classical penalty methods of optimization, such as the quadratic penalty method, solve a sequence of unconstrained problems that put greater and greater stress on meeting the constraints. In the limit as the penalty constant tends to ∞, one recovers the constrained solution. In the exact penalty method, squared penalties are replaced by absolute value penalties, and the solution is recovered for a finite value of the penalty constant. The exact path-following method starts at the unconstrained solution and follows the solution path as the penalty constant increases. In the process, the solution path hits, slides along, and exits from the various constraints. Path following in Lasso penalized regression, in contrast, starts with a large value of the penalty constant and works its way downward. In both settings, inspection of the entire solution path is revealing. Just as with the Lasso and generalized Lasso, it is possible to plot the effective degrees of freedom along the solution path. For a strictly convex quadratic program, the exact penalty algorithm can be framed entirely in terms of the sweep operator of regression analysis. A few well-chosen examples illustrate the mechanics and potential of path following. This article has supplementary materials available online.

[1]  C. Hildreth Point Estimates of Ordinates of Concave Functions , 1954 .

[2]  H. D. Brunk Maximum Likelihood Estimates of Monotone Parameters , 1955 .

[3]  U. Grenander On the theory of mortality measurement , 1956 .

[4]  H. Barnett A Theory of Mortality , 1968 .

[5]  A. Dempster Elements of Continuous Multivariate Analysis , 1969 .

[6]  D. L. Hanson,et al.  Consistency in Concave Regression , 1976 .

[7]  J. Goodnight A Tutorial on the SWEEP Operator , 1979 .

[8]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[9]  David A. Schoenfeld,et al.  Confidence bounds for normal means under order restrictions, with application to dose-response curves, toxicology experiments, and low-dose extrapolation , 1986 .

[10]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[11]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[12]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised Edition) , 1999 .

[13]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[14]  E. Mammen Nonparametric regression under qualitative smoothness assumptions , 1991 .

[15]  W. Wong,et al.  Convergence Rate of Sieve Estimates , 1994 .

[16]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[17]  Carla Savage,et al.  A Survey of Combinatorial Gray Codes , 1997, SIAM Rev..

[18]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[19]  Kenneth Lange,et al.  Numerical analysis for statisticians , 1999 .

[20]  Mary C. Meyer,et al.  ON THE DEGREES OF FREEDOM IN SHAPE-RESTRICTED REGRESSION , 2000 .

[21]  J. Wellner,et al.  Estimation of a convex function: characterizations and asymptotic theory. , 2001 .

[22]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[23]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[24]  Arie Beresteanu,et al.  Nonparametric Estimation of Regression Functions under Restrictions on Partieal Derivatives , 2004 .

[25]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[26]  A. Ruszczynski,et al.  Nonlinear Optimization , 2006 .

[27]  Athanasios C. Micheas,et al.  Constrained Statistical Inference: Inequality, Order, and Shape Restrictions , 2006 .

[28]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[29]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[30]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[31]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[32]  J. Leeuw,et al.  Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods , 2009 .

[33]  Jieping Ye,et al.  An efficient algorithm for a class of fused lasso problems , 2010, KDD.

[34]  Xi Chen,et al.  Smoothing proximal gradient method for general structured sparse regression , 2010, The Annals of Applied Statistics.

[35]  E. Xing,et al.  An E-cient Proximal Gradient Method for General Structured Sparse Learning , 2010 .

[36]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[37]  Christian P. Robert Numerical Analysis for Statisticians, Second Edition by Kenneth Lange , 2011 .

[38]  Robert Tibshirani,et al.  Nearly-Isotonic Regression , 2011, Technometrics.

[39]  J. Friedman Fast sparse regression and classification , 2012 .