A Generic Path Algorithm for Regularized Statistical Estimation

Regularization is widely used in statistics and machine learning to prevent overfitting and gear solution toward prior information. In general, a regularized estimation problem minimizes the sum of a loss function and a penalty term. The penalty term is usually weighted by a tuning parameter and encourages certain constraints on the parameters to be estimated. Particular choices of constraints lead to the popular lasso, fused-lasso, and other generalized ℓ1 penalized regression methods. In this article we follow a recent idea by Wu and propose an exact path solver based on ordinary differential equations (EPSODE) that works for any convex loss function and can deal with generalized ℓ1 penalties as well as more complicated regularization such as inequality constraints encountered in shape-restricted regressions and nonparametric density estimation. Nonasymptotic error bounds for the equality regularized estimates are derived. In practice, the EPSODE can be coupled with AIC, BIC, Cp or cross-validation to select an optimal tuning parameter, or provide a convenient model space for performing model averaging or aggregation. Our applications to generalized ℓ1 regularized generalized linear models, shape-restricted regressions, Gaussian graphical models, and nonparametric density estimation showcase the potential of the EPSODE algorithm. Supplementary materials for this article are available online.

[1]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[2]  A. Ruszczynski,et al.  Nonlinear Optimization , 2006 .

[3]  Arnab Maity,et al.  Parametrically guided generalised additive models with application to mergers and acquisitions data , 2013, Journal of nonparametric statistics.

[4]  S. Sra,et al.  Matrix Differential Calculus , 2005 .

[5]  Christian P. Robert Numerical Analysis for Statisticians, Second Edition by Kenneth Lange , 2011 .

[6]  Karline Soetaert,et al.  Solving Differential Equations in R: Package deSolve , 2010 .

[7]  J. Wellner,et al.  Estimation of a k-monotone density: limit distribution theory and the Spline connection , 2005, math/0509081.

[8]  Stanley R. Johnson,et al.  Varying Coefficient Models , 1984 .

[9]  P. L. Davies,et al.  Stepwise Regression , 2016, The SAGE Encyclopedia of Research Design.

[10]  Yihui Wang,et al.  Did Structured Credit Fuel the LBO Boom? , 2009 .

[11]  P. Sen,et al.  Constrained Statistical Inference: Inequality, Order, and Shape Restrictions , 2001 .

[12]  Yichao Wu,et al.  An ordinary differential equation-based solution path algorithm , 2011, Journal of nonparametric statistics.

[13]  Xi Chen,et al.  Smoothing proximal gradient method for general structured sparse regression , 2010, The Annals of Applied Statistics.

[14]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[15]  Ji Zhu,et al.  Quantile Regression in Reproducing Kernel Hilbert Spaces , 2007 .

[16]  J. Wellner,et al.  Information Bounds and Nonparametric Maximum Likelihood Estimation , 1992 .

[17]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[18]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[19]  Robert B. Gramacy,et al.  Maximum likelihood estimation of a multivariate log-concave density , 2010 .

[20]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[21]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[22]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[23]  Kenneth Lange,et al.  Numerical analysis for statisticians , 1999 .

[24]  M. Yuan Efficient Computation of ℓ1 Regularized Estimates in Gaussian Graphical Models , 2008 .

[25]  D. Ghosh,et al.  An improved model averaging scheme for logistic regression , 2009, J. Multivar. Anal..

[26]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[27]  Yichao Wu ELASTIC NET FOR COX'S PROPORTIONAL HAZARDS MODEL WITH A SOLUTION PATH ALGORITHM. , 2012, Statistica Sinica.

[28]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[29]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[30]  Jayanta Kumar Pal,et al.  Estimating a Polya frequency function$_2$ , 2007, 0708.1064.

[31]  Chris Fraley,et al.  Model-averaged ℓ1 regularization using Markov chain Monte Carlo model composition , 2015 .

[32]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[33]  J. Friedman Fast sparse regression and classification , 2012 .

[34]  M. Cule,et al.  Maximum likelihood estimation of a multi‐dimensional log‐concave density , 2008, 0804.3989.

[35]  S. Pandey,et al.  What Are Degrees of Freedom , 2008 .

[36]  G. Walther Inference and Modeling with Log-concave Distributions , 2009, 1010.0305.

[37]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised Edition) , 1999 .

[38]  Stephen P. Boyd,et al.  1 Trend Filtering , 2009, SIAM Rev..

[39]  G. Walther Detecting the Presence of Mixing with Multiscale Maximum Likelihood , 2002 .

[40]  Kaspar Rufibach,et al.  An active set algorithm to estimate parameters in generalized linear models with ordered predictors , 2009, Comput. Stat. Data Anal..

[41]  E. Xing,et al.  An E-cient Proximal Gradient Method for General Structured Sparse Learning , 2010 .

[42]  J. Wellner,et al.  Limit Distribution Theory for Maximum Likelihood Estimation of a Log-Concave Density. , 2007, Annals of statistics.

[43]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[44]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[45]  Mee Young Park,et al.  L 1-regularization path algorithm for generalized linear models , 2006 .

[46]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[47]  Karline Soetaert,et al.  Inverse Modelling, Sensitivity and Monte Carlo Analysis in R Using Package FME , 2010 .

[48]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[49]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[50]  Kaspar Rufibach,et al.  Active Set and EM Algorithms for Log-Concave Densities Based on Complete and Censored Data , 2007, 0707.4643.

[51]  A. Dempster Elements of Continuous Multivariate Analysis , 1969 .

[52]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the lasso , 2007, 0708.3517.

[53]  Geurt Jongbloed,et al.  The Iterative Convex Minorant Algorithm for Nonparametric Estimation , 1998 .

[54]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[55]  K. Lange,et al.  A Path Algorithm for Constrained Estimation , 2011, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[56]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[57]  D. Donoho,et al.  Atomic Decomposition by Basis Pursuit , 2001 .

[58]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[59]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[60]  J. Goodnight A Tutorial on the SWEEP Operator , 1979 .

[61]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[62]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[63]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[64]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[65]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .