Ensemble post-processing is a promising method to obtain flexible distributed lag models

Distributed lag models (DLM) are regression models that include multiple lagged exposure variables as covariates. They are frequently used to model the relationship between daily mortality and short-term air pollution exposures. Specifying a maximum lag number is but one of the difficulties in using a DLM for environmental epidemiology. We propose an easily extendible ensemble post-processing approach. The resultant estimates are both more parsimonious, approaching zero with increasing lag, and more efficient. The benefits are shown to be robust under various simulation scenario’s and illustrated with data from the National Morbidity, Mortality and Air Pollution Study.

[1]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[2]  S. Rose Mortality risk score prediction in an elderly population using machine learning. , 2013, American journal of epidemiology.

[3]  Helmut Küchenhoff,et al.  Flexible distributed lags for modelling earthquake data , 2015 .

[4]  J Schwartz,et al.  The distributed lag between air pollution and daily deaths. , 2000, Epidemiology.

[5]  T. Louis,et al.  Model choice in time series studies of air pollution and mortality , 2006 .

[6]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[7]  A Gasparrini,et al.  Distributed lag non-linear models , 2010, Statistics in medicine.

[8]  Danny Coomans,et al.  Predictive weighting for cluster ensembles , 2007 .

[9]  Steven Roberts,et al.  Bootstrap-after-Bootstrap Model Averaging for Reducing Model Uncertainty in Model Selection for Air Pollution Mortality Studies , 2009, Environmental health perspectives.

[10]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Shirley Almon The Distributed Lag Between Capital Appropriations and Expenditures , 1965 .

[13]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[14]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[15]  Bogdan E. Popescu,et al.  Importance Sampled Learning Ensembles , 2003 .

[16]  Roger D. Peng,et al.  The National Morbidity, Mortality, and Air Pollution Study Database in R , 2004 .

[17]  S L Zeger,et al.  Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality , 2009, Biometrics.