lassopack: Model selection and prediction with regularized regression in Stata

In this article, we introduce lassopack, a suite of programs for regularized regression in Stata. lassopack implements lasso, square-root lasso, elastic net, ridge regression, adaptive lasso, and postestimation ordinary least squares. The methods are suitable for the high-dimensional setting, where the number of predictors p may be large and possibly greater than the number of observations, n. We offer three approaches for selecting the penalization (“tuning”) parameters: information criteria (implemented in lasso2), K-fold cross-validation and h-step-ahead rolling cross-validation for cross-section, panel, and time-series data (cvlasso), and theory-driven (“rigorous” or plugin) penalization for the lasso and square-root lasso for cross-section and panel data (rlasso). We discuss the theoretical framework and practical considerations for each approach. We also present Monte Carlo results to compare the performances of the penalization approaches.

[1]  A. Belloni,et al.  SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN , 2012 .

[2]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[3]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[4]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[5]  Christian Hansen,et al.  Inference in High-Dimensional Panel Models With an Application to Gun Control , 2014, 1411.6507.

[6]  Victor Chernozhukov,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011 .

[7]  Runze Li,et al.  Regularization Parameter Selections via Generalized Information Criterion , 2010, Journal of the American Statistical Association.

[8]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[9]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[10]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2010, 1009.5689.

[11]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009, 1001.0188.

[12]  Lee H. Dicker,et al.  Ridge regression and asymptotic minimax estimation over spheres of growing dimension , 2016, 1601.03900.

[13]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[14]  I. Seidl,et al.  The socio-economic determinants of urban sprawl between 1980 and 2010 in Switzerland , 2017 .

[15]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[16]  Marine Carrasco,et al.  A regularization approach to the many instruments problem , 2012 .

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  H. Akaike A new look at the statistical model identification , 1974 .

[19]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[20]  Yuhong Yang Can the Strengths of AIC and BIC Be Shared , 2005 .

[21]  Christian Hansen,et al.  Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments , 2015, 1501.03185.

[22]  Susan Athey,et al.  The Impact of Machine Learning on Economics , 2018, The Economics of Artificial Intelligence.

[23]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[24]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[25]  V. Chernozhukov,et al.  High-Dimensional Metrics in R , 2016, 1603.01700.

[26]  Barbara L. Welther,et al.  The impact , 1995 .

[27]  M. Stone An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion , 1977 .

[28]  Victor Chernozhukov,et al.  On cross-validated Lasso , 2016 .

[29]  D. Andrews,et al.  Asymptotic optimality of generalized CL, cross-validation, and generalized cross-validation in regression with heteroskedastic errors , 1991 .

[30]  Stefan Wager,et al.  High-Dimensional Asymptotics of Prediction: Ridge Regression and Classification , 2015, 1507.03003.

[31]  Christian Hansen,et al.  Instrumental variables estimation with many weak instruments using regularized JIVE , 2014 .

[32]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[33]  Victor Chernozhukov,et al.  High Dimensional Sparse Econometric Models: An Introduction , 2011, 1106.5242.

[34]  Esa Ollila,et al.  Scaled and square-root elastic net , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[36]  N. Sugiura Further analysts of the data by akaike' s information criterion and the finite corrections , 1978 .

[37]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[38]  Hiroshi Yamada The Frisch–Waugh–Lovell theorem for the lasso and the ridge regression , 2017 .

[39]  Victor Chernozhukov,et al.  Pivotal estimation via square-root Lasso in nonparametric regression , 2014 .

[40]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[41]  Yuhong Yang COMPARING LEARNING METHODS FOR CLASSIFICATION , 2006 .

[42]  Peter Bühlmann,et al.  p-Values for High-Dimensional Regression , 2008, 0811.2177.

[43]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[44]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[45]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[46]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[47]  Susan Athey,et al.  Machine Learning Methods That Economists Should Know About , 2019, Annual Review of Economics.

[48]  R. Tibshirani,et al.  Degrees of freedom in lasso problems , 2011, 1111.0653.

[49]  Jure Leskovec,et al.  Human Decisions and Machine Predictions , 2017, The quarterly journal of economics.

[50]  J. Shao AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[51]  Edmond Chow,et al.  A cross-validatory method for dependent data , 1994 .

[52]  Kengo Kato,et al.  Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors , 2012, 1212.6906.

[53]  Bing-Yi Jing,et al.  Self-normalized Cramér-type large deviations for independent random variables , 2003 .

[54]  Hal R. Varian,et al.  Big Data: New Tricks for Econometrics , 2014 .

[55]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[56]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[57]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[58]  Peter Buhlmann Statistical significance in high-dimensional linear models , 2012, 1202.1377.

[59]  Sergio Correia FTOOLS: Stata module to provide alternatives to common Stata commands optimized for large datasets , 2016 .

[60]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[61]  Sendhil Mullainathan,et al.  Machine Learning: An Applied Econometric Approach , 2017, Journal of Economic Perspectives.

[62]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[63]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[64]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[65]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .