Sparse Estimation and Uncertainty with Application to Subgroup Analysis

We introduce a Bayesian method, LASSOplus, that unifies recent contributions in the sparse modeling literatures, while substantially extending pre-existing estimators in terms of both performance and flexibility. Unlike existing Bayesian variable selection methods, LASSOplus both selects and estimates effects while returning estimated confidence intervals for discovered effects. Furthermore, we show how LASSOplus easily extends to modeling repeated observations and permits a simple Bonferroni correction to control coverage on confidence intervals among discovered effects. We situate LASSOplus in the literature on how to estimate subgroup effects, a topic that often leads to a proliferation of estimation parameters. We also offer a simple preprocessing step that draws on recent theoretical work to estimate higher-order effects that can be interpreted independently of their lower-order terms. A simulation study illustrates the method’s performance relative to several existing variable selection methods. In addition, we apply LASSOplus to an existing study on public support for climate treaties to illustrate the method’s ability to discover substantive and relevant effects. Software implementing the method is publicly available in the R package sparsereg .

[1]  Nicholas G. Polson,et al.  The Horseshoe+ Estimator of Ultra-Sparse Signals , 2015, 1502.00560.

[2]  Jens Hainmueller,et al.  The Hidden American Immigration Consensus: A Conjoint Analysis of Attitudes Toward Immigrants , 2012 .

[3]  N. Pillai,et al.  Dirichlet–Laplace Priors for Optimal Shrinkage , 2014, Journal of the American Statistical Association.

[4]  A. Buja,et al.  Valid post-selection inference , 2013, 1306.1059.

[5]  W. Loh,et al.  A regression tree approach to identifying subgroups with differential treatment effects , 2014, Statistics in medicine.

[6]  I. Lipkovich,et al.  Subgroup identification based on differential effect search—A recursive partitioning method for establishing response to treatment in patient subpopulations , 2011, Statistics in medicine.

[7]  V. Chernozhukov,et al.  Inference on Counterfactual Distributions , 2009, 0904.0951.

[8]  Brandon M. Stewart,et al.  Latent Factor Regressions for the Social Sciences , 2014 .

[9]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[10]  Y. Benjamini,et al.  False Discovery Rate–Adjusted Multiple Confidence Intervals for Selected Parameters , 2005 .

[11]  Bruce E. Hansen,et al.  The Risk of James–Stein and Lasso Shrinkage , 2016 .

[12]  Daniel J. Hopkins,et al.  Causal Inference in Conjoint Analysis: Understanding Multidimensional Choices via Stated Preference Experiments , 2013 .

[13]  S. Cole,et al.  Generalizing evidence from randomized clinical trials to target populations: The ACTG 320 trial. , 2010, American journal of epidemiology.

[14]  D. Green,et al.  Modeling Heterogeneous Treatment Effects in Survey Experiments with Bayesian Additive Regression Trees , 2012 .

[15]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[16]  James O. Berger,et al.  Estimating a Product of Means: Bayesian Analysis with Reference Priors , 1989 .

[17]  Jane Lawrence Sumner,et al.  Marginal Effects in Interaction Models: Determining and Controlling the False Positive Rate , 2018 .

[18]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[19]  K. Imai,et al.  Estimation of Heterogeneous Treatment Effects from Randomized Experiments, with Application to the Optimal Planning of the Get-Out-the-Vote Campaign , 2011, Political Analysis.

[20]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[21]  A. Belloni,et al.  Inference for High-Dimensional Sparse Econometric Models , 2011, 1201.0220.

[22]  Emmanuel J. Candès,et al.  Modern statistical estimation via oracle inequalities , 2006, Acta Numerica.

[23]  Marc Ratkovic,et al.  Estimating treatment effect heterogeneity in randomized program evaluation , 2013, 1305.5682.

[24]  Matthew Shum,et al.  BLP-Lasso for Aggregate Discrete Choice Models of Elections with Rich Demographic Covariates , 2015 .

[25]  B. Efron Frequentist accuracy of Bayesian estimates , 2015, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[26]  Jian Kang,et al.  Self-adaptive Lasso and its Bayesian Estimation , 2010 .

[27]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[28]  Lu Tian,et al.  A Perturbation Method for Inference on Regularized Regression Estimates , 2011, Journal of the American Statistical Association.

[29]  Benedikt M. Pötscher,et al.  On the Distribution of Penalized Maximum Likelihood Estimators: The LASSO, SCAD, and Thresholding , 2007, J. Multivar. Anal..

[30]  H. Leeb,et al.  On various confidence intervals post-model-selection , 2014, 1401.2267.

[31]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[32]  Chris Hans Bayesian lasso regression , 2009 .

[33]  Jeff Gill,et al.  Bayesian Methods : A Social and Behavioral Sciences Approach , 2002 .

[34]  Bin Yu,et al.  Asymptotic Properties of Lasso+mLS and Lasso+Ridge in Sparse High-dimensional Linear Regression , 2013, 1306.5505.

[35]  J. Berger The case for objective Bayesian analysis , 2006 .

[36]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[37]  Jaeyong Lee,et al.  GENERALIZED DOUBLE PARETO SHRINKAGE. , 2011, Statistica Sinica.

[38]  A. Belloni,et al.  SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN , 2012 .

[39]  Creasy Problem,et al.  Reference Posterior Distributions for Bayesian Inference , 1979 .

[40]  W. Lin,et al.  Agnostic notes on regression adjustments to experimental data: Reexamining Freedman's critique , 2012, 1208.2301.

[41]  Chenlei Leng,et al.  Bayesian adaptive Lasso , 2010, Annals of the Institute of Statistical Mathematics.

[42]  G. Casella,et al.  Penalized regression, standard errors, and Bayesian lassos , 2010 .

[43]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[44]  C. Bulpitt SUBGROUP ANALYSIS , 1988, The Lancet.

[45]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[46]  D. Freedman Statistical Models and Causal Inference: On Regression Adjustments in Experiments with Several Treatments , 2008, 0803.3757.

[47]  C. Carvalho,et al.  Decoupling Shrinkage and Selection in Bayesian Linear Models: A Posterior Summary Perspective , 2014, 1408.0464.

[48]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[49]  H. Leeb,et al.  Sparse Estimators and the Oracle Property, or the Return of Hodges' Estimator , 2007, 0704.1466.

[50]  J. Griffin,et al.  Inference with normal-gamma prior distributions in regression problems , 2010 .

[51]  Robin E. Best,et al.  Conditional Cooperation and Climate Change , 2014 .

[52]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[53]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[54]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[55]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[56]  A. V. D. Vaart,et al.  Needles and Straw in a Haystack: Posterior concentration for possibly sparse sequences , 2012, 1211.1197.

[57]  Kosuke Imai,et al.  Causal Interaction in High-Dimension ∗ , 2015 .

[58]  E. Jaynes On the rationale of maximum-entropy methods , 1982, Proceedings of the IEEE.

[59]  Brenton Kenkel,et al.  Bootstrapped Basis Regression with Variable Selection: A New Method for Flexible Functional Form Estimation , 2013 .

[60]  J. M. Taylor,et al.  Subgroup identification from randomized clinical trial data , 2011, Statistics in medicine.

[61]  Hyungsik Roger Moon,et al.  BLP-Lasso for Aggregate Discrete Choice Models Applied to Elections with Rich Demographic Covariates ∗ , 2015 .

[62]  Chad Hazlett,et al.  Kernel Regularized Least Squares: Reducing Misspecification Bias with a Flexible and Interpretable Machine Learning Approach , 2014, Political Analysis.

[63]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[64]  Kenneth F. Scheve,et al.  Mass support for global climate agreements depends on institutional design , 2012, Proceedings of the National Academy of Sciences.

[65]  S. Lahiri,et al.  Bootstrapping Lasso Estimators , 2011 .

[66]  Carlos Lamarche,et al.  Penalized Quantile Regression with Semiparametric Correlated Effects: Applications with Heterogeneous Preferences , 2017, SSRN Electronic Journal.

[67]  Xiaogang Su,et al.  Subgroup Analysis via Recursive Partitioning , 2009 .

[68]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[69]  R. O’Hara,et al.  A review of Bayesian variable selection methods: what, how and which , 2009 .

[70]  Jayanta K. Ghosh,et al.  Asymptotic Properties of Bayes Risk for the Horseshoe Prior , 2013 .

[71]  Gautam Tripathi,et al.  A matrix extension of the Cauchy-Schwarz inequality , 1999 .

[72]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[73]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[74]  Sourav Chatterjee,et al.  Assumptionless consistency of the Lasso , 2013, 1303.5817.

[75]  J. Bernardo Reference Analysis , 2005 .

[76]  H. Prosper Bayesian Analysis , 2000, hep-ph/0006356.

[77]  James G. Scott,et al.  Local shrinkage rules, Lévy processes and regularized regression , 2010, 1010.3390.

[78]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[79]  M. West On scale mixtures of normal distributions , 1987 .

[80]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009 .

[81]  Justin Grimmer,et al.  Estimating Heterogeneous Treatment Effects and the Effects of Heterogeneous Treatments with Ensemble Methods , 2017, Political Analysis.

[82]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[83]  J. Berger,et al.  A Bayesian Approach to Subgroup Identification , 2014, Journal of biopharmaceutical statistics.

[84]  S. Lahiri,et al.  Rates of convergence of the Adaptive LASSO estimators to the Oracle distribution and higher order refinements by the bootstrap , 2013, 1307.1952.

[85]  David A. Freedman,et al.  On regression adjustments to experimental data , 2008, Adv. Appl. Math..

[86]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[87]  Simon Jackman,et al.  Bayesian Analysis for the Social Sciences , 2009 .

[88]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[89]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[90]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[91]  Jim E. Griffin,et al.  Structuring shrinkage: some correlated priors for regression , 2012 .

[92]  Dries F. Benoit,et al.  Bayesian adaptive Lasso quantile regression , 2012 .

[93]  J. Bernardo,et al.  THE FORMAL DEFINITION OF REFERENCE PRIORS , 2009, 0904.0156.