Subset selection for linear mixed models

Linear mixed models (LMMs) are instrumental for regression analysis with structured dependence, such as grouped, clustered, or multilevel data. However, selection among the covariates—while accounting for this structured dependence—remains a challenge. We introduce a Bayesian decision analysis for subset selection with LMMs. Using a Mahalanobis loss function that incorporates the structured dependence, we derive optimal linear actions for any subset of covariates and under any Bayesian LMM. Crucially, these actions inherit shrinkage or regularization and uncertainty quantification from the underlying Bayesian LMM. Rather than selecting a single “best” subset, which is often unstable and limited in its information content, we collect the acceptable family of subsets that nearly match the predictive ability of the “best” subset. The acceptable family is summarized by its smallest member and key variable importance metrics. Customized subset search and out-of-sample approximation algorithms are provided for more scalable computing. These tools are applied to simulated data and a longitudinal physical activity dataset, and in both cases demonstrate excellent prediction, estimation, and selection ability.

[1]  D. Berrigan,et al.  Association between Objectively Measured Physical Activity and Mortality in NHANES. , 2016, Medicine & Science in Sports & Exercise.

[2]  Daniel R. Kowal,et al.  Bayesian Function-on-Scalars Regression for High-Dimensional Data , 2018, Journal of Computational and Graphical Statistics.

[3]  Scott D. Foster,et al.  Incorporating LASSO effects into a mixed model for quantitative trait loci detection , 2007 .

[4]  D. Dunson,et al.  Random Effects Selection in Linear Mixed Models , 2003, Biometrics.

[5]  S. Müller,et al.  Model Selection in Linear Mixed Models , 2013, 1306.2427.

[6]  C. Carvalho,et al.  Decoupling Shrinkage and Selection in Bayesian Linear Models: A Posterior Summary Perspective , 2014, 1408.0464.

[7]  David Puelz,et al.  Variable Selection in Seemingly Unrelated Regressions with Random Predictors , 2016, 1605.08963.

[8]  Daniel R. Kowal Bayesian subset selection and variable importance for interpretable prediction and classification , 2021, ArXiv.

[9]  J. S. Rao,et al.  Fence methods for mixed model selection , 2008, 0808.0985.

[10]  Ali Shojaie,et al.  In Defense of the Indefensible: A Very Naïve Approach to High-Dimensional Inference. , 2017, Statistical science : a review journal of the Institute of Mathematical Statistics.

[11]  Trevor Hastie,et al.  Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons , 2020 .

[12]  Daniel R. Kowal,et al.  Bayesian variable selection for understanding mixtures in environmental exposures , 2021, Statistics in medicine.

[13]  H. Bondell,et al.  Joint Variable Selection for Fixed and Random Effects in Linear Mixed‐Effects Models , 2010, Biometrics.

[14]  J. Ibrahim,et al.  Fixed and Random Effects Selection in Mixed Effects Models , 2011, Biometrics.

[15]  K. Eskridge,et al.  Identifying QTLs and Epistasis in Structured Plant Populations Using Adaptive Mixed LASSO , 2011 .

[16]  P. Richard Hahn,et al.  Post-Processing Posteriors Over Precision Matrices to Produce Sparse Graph Estimates , 2019 .

[17]  Daniel R. Kowal Fast, Optimal, and Targeted Predictions Using Parameterized Decision Analysis , 2020, Journal of the American Statistical Association.

[18]  Runze Li,et al.  VARIABLE SELECTION IN LINEAR MIXED EFFECTS MODELS. , 2012, Annals of statistics.

[19]  Satkartar K. Kinney,et al.  Fixed and Random Effects Selection in Linear and Logistic Models , 2007, Biometrics.

[20]  D. Lindley The Choice of Variables in Multiple Regression , 1968 .

[21]  Robert W. Wilson,et al.  Regressions by Leaps and Bounds , 2000, Technometrics.