Bayesian graphical models for regression on multiple data sets with different variables

Routinely collected administrative data sets, such as national registers, aim to collect information on a limited number of variables for the whole population. In contrast, survey and cohort studies contain more detailed data from a sample of the population. This paper describes Bayesian graphical models for fitting a common regression model to a combination of data sets with different sets of covariates. The methods are applied to a study of low birth weight and air pollution in England and Wales using a combination of register, survey, and small-area aggregate data. We discuss issues such as multiple imputation of confounding variables missing in one data set, survey selection bias, and appropriate propagation of information between model components. From the register data, there appears to be an association between low birth weight and environmental exposure to NO2, but after adjusting for confounding by ethnicity and maternal smoking by combining the register and survey data under our models, we find there is no significant association. However, NO2 was associated with a small but significant reduction in birth weight, modeled as a continuous variable.

[1]  S. Corbett,et al.  Impact of ambient air pollution on birth weight in Sydney, Australia , 2005, Occupational and Environmental Medicine.

[2]  S L Zeger,et al.  Exposure measurement error in time-series studies of air pollution: concepts and consequences. , 2000, Environmental health perspectives.

[3]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[4]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[5]  B G Armstrong,et al.  Effect of measurement error on epidemiological studies of environmental and occupational exposures. , 1998, Occupational and environmental medicine.

[6]  Beat Neuenschwander,et al.  Combining MCMC with ‘sequential’ PKPD modelling , 2009, Journal of Pharmacokinetics and Pharmacodynamics.

[7]  T. Wardlaw,et al.  Low birthweight: country regional and global estimates. , 2004 .

[8]  Sander Greenland,et al.  Multiple‐bias modelling for analysis of observational data , 2005 .

[9]  Ross L. Prentice,et al.  Aggregate data studies of disease risk factors , 1995 .

[10]  Rupa Basu,et al.  Air Pollution and Birth Weight Among Term Infants in California , 2005, Pediatrics.

[11]  N E Breslow,et al.  Weighted likelihood, pseudo-likelihood and maximum likelihood methods for logistic regression analysis of two-stage data. , 1997, Statistics in medicine.

[12]  W. Murray,et al.  Season and Outdoor Ambient Temperature: Effects on Birth Weight , 2000, Obstetrics and gynecology.

[13]  Sebastien J-P A Haneuse,et al.  Hierarchical Models for Combining Ecological and Case–Control Data , 2007, Biometrics.

[14]  S. Richardson,et al.  Hierarchical related regression for combining aggregate and individual data in studies of socio‐economic disease risk factors , 2007 .

[15]  Ruth Salway,et al.  A statistical framework for ecological and aggregate studies , 2001 .

[16]  Daniel Krewski,et al.  Association between gaseous ambient air pollutants and adverse pregnancy outcomes in Vancouver, Canada. , 2003, Environmental health perspectives.

[17]  Sylvia Richardson,et al.  Improving ecological inference using individual‐level data , 2006, Statistics in medicine.

[18]  A. Gelman,et al.  Not Asked and Not Answered: Multiple Imputation for Multiple Surveys , 1998 .

[19]  P. Gustafson,et al.  Conservative prior distributions for variance parameters in hierarchical models , 2006 .

[20]  C. Wild,et al.  Vector Generalized Additive Models , 1996 .

[21]  Craig Hansen,et al.  Low levels of ambient air pollution during pregnancy and fetal growth among term neonates in Brisbane, Australia. , 2007, Environmental research.

[22]  S Greenland,et al.  Ecological bias, confounding, and effect modification. , 1989, International journal of epidemiology.

[23]  E. Boyko,et al.  The Millennium Cohort Study. , 2002, Military medicine.

[24]  J. Carlin,et al.  Poststratification and Weighting Adjustments , 2000 .

[25]  A. Wilcox,et al.  Birthweight and perinatal mortality: I. On the frequency distribution of birthweight. , 1983, International journal of epidemiology.

[26]  N. Gouveia,et al.  Association between ambient air pollution and birth weight in São Paulo, Brazil , 2003, Journal of epidemiology and community health.

[27]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[28]  J E White,et al.  A two stage design for the study of the relationship between a rare exposure and a rare disease. , 1982, American journal of epidemiology.