Using ecological propensity score to adjust for missing confounders in small area studies

&NA; Small area ecological studies are commonly used in epidemiology to assess the impact of area level risk factors on health outcomes when data are only available in an aggregated form. However, the resulting estimates are often biased due to unmeasured confounders, which typically are not available from the standard administrative registries used for these studies. Extra information on confounders can be provided through external data sets such as surveys or cohorts, where the data are available at the individual level rather than at the area level; however, such data typically lack the geographical coverage of administrative registries. We develop a framework of analysis which combines ecological and individual level data from different sources to provide an adjusted estimate of area level risk factors which is less biased. Our method (i) summarizes all available individual level confounders into an area level scalar variable, which we call ecological propensity score (EPS), (ii) implements a hierarchical structured approach to impute the values of EPS whenever they are missing, and (iii) includes the estimated and imputed EPS into the ecological regression linking the risk factors to the health outcome. Through a simulation study, we show that integrating individual level data into small area analyses via EPS is a promising method to reduce the bias intrinsic in ecological studies due to unmeasured confounders; we also apply the method to a real case study to evaluate the effect of air pollution on coronary heart disease hospital admissions in Greater London.

[1]  Bert Brunekreef,et al.  Association between mortality and indicators of traffic-related air pollution in the Netherlands: a cohort study , 2002, The Lancet.

[2]  Michael Jerrett,et al.  Long-term exposure to air pollution and cardiorespiratory disease in the California teachers study cohort. , 2011, American journal of respiratory and critical care medicine.

[3]  Til Stürmer,et al.  Adjusting effect estimates for unmeasured confounding with validation data using propensity score calibration. , 2005, American journal of epidemiology.

[4]  Zaid Chalabi,et al.  Home energy efficiency and radon related risk of lung cancer: modelling study , 2014, BMJ : British Medical Journal.

[5]  Sylvia Richardson,et al.  Using Bayesian graphical models to model biases in observational studies and to combine multiple sources of data: application to low birth weight and water disinfection by‐products , 2009 .

[6]  J. Besag,et al.  Bayesian image restoration, with two applications in spatial statistics , 1991 .

[7]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[8]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[9]  J. Besag,et al.  On conditional and intrinsic autoregressions , 1995 .

[10]  D. Strachan,et al.  Chronic exposure to outdoor air pollution and diagnosed cardiovascular disease: meta-analysis of three large cross-sectional surveys , 2009, Environmental health : a global access science source.

[11]  S. Senn,et al.  Stratification for the propensity score compared with linear regression techniques to assess the effect of treatment or exposure , 2007, Statistics in medicine.

[12]  Jiming Jiang,et al.  Mixed model prediction and small area estimation , 2006 .

[13]  Paolo Vineis,et al.  Long term exposure to ambient air pollution and incidence of acute coronary events: prospective cohort study and meta-analysis in 11 European cohorts from the ESCAPE Project , 2014, BMJ : British Medical Journal.

[14]  P. Moran Notes on continuous stochastic phenomena. , 1950, Biometrika.

[15]  D. Pfeffermann Small Area Estimation‐New Developments and Directions , 2002 .

[16]  L. Sheppard,et al.  Long-term exposure to air pollution and incidence of cardiovascular events in women. , 2007, The New England journal of medicine.

[17]  A. Rotnitzky,et al.  Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis by DANIELS, M. J. and HOGAN, J. W , 2009 .

[18]  Léo R. Belzile,et al.  A Bayesian view of doubly robust causal inference , 2016, 1701.04093.

[19]  Dani Gamerman,et al.  Space-varying regression models: specifications and simulation , 2001, Comput. Stat. Data Anal..

[20]  Sander Greenland,et al.  Multiple‐bias modelling for analysis of observational data , 2005 .

[21]  P. Diggle,et al.  Bayesian Inference in Gaussian Model-based Geostatistics , 2002 .

[22]  A. Peters,et al.  Particulate Matter Air Pollution and Cardiovascular Disease: An Update to the Scientific Statement From the American Heart Association , 2010, Circulation.

[23]  Malay Ghosh,et al.  Small Area Estimation: An Appraisal , 1994 .

[24]  Sylvia Richardson,et al.  Adjustment for Missing Confounders Using External Validation Data and Propensity Scores , 2012 .

[25]  S. van Buuren,et al.  Flexible mutlivariate imputation by MICE , 1999 .

[26]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[27]  Corwin M Zigler,et al.  Model Feedback in Bayesian Propensity Score Estimation , 2013, Biometrics.

[28]  Michael G Kenward,et al.  Multiple imputation: current perspectives , 2007, Statistical methods in medical research.

[29]  D. Rubin The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials , 2007, Statistics in medicine.

[30]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[31]  James R Carpenter,et al.  Joint modelling rationale for chained equations , 2014, BMC Medical Research Methodology.

[32]  David J. Lunn,et al.  The BUGS Book: A Practical Introduction to Bayesian Analysis , 2013 .

[33]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[34]  Tjeerd P van Staa,et al.  Long-Term Exposure to Outdoor Air Pollution and Incidence of Cardiovascular Diseases , 2013, Epidemiology.

[35]  L. Fahrmeir,et al.  Bayesian inference for generalized additive mixed models based on Markov random field priors , 2001 .

[36]  Joel Schwartz,et al.  Chronic Fine and Coarse Particulate Exposure, Mortality, and Coronary Heart Disease in the Nurses’ Health Study , 2008, Environmental health perspectives.

[37]  David J. Lunn,et al.  Generic reversible jump MCMC using graphical models , 2009, Stat. Comput..

[38]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[39]  Martin L Hazelton,et al.  Tutorial in biostatistics: spline smoothing with linear mixed models , 2005, Statistics in medicine.

[40]  Ciprian M. Crainiceanu,et al.  Bayesian Analysis for Penalized Spline Regression Using WinBUGS , 2005 .

[41]  R. Beelen,et al.  Comparison of land-use regression models between Great Britain and the Netherlands , 2010 .