Spatial Regression Models for Large-Cohort Studies Linking Community Air Pollution and Health

Cohort study designs are often used to assess the association between community-based ambient air pollution concentrations and health outcomes, such as mortality, development and prevalence of disease, and pulmonary function. Typically, a large number of subjects are enrolled in the study in each of a small number of communities. Fixed-site monitors are used to determine long-term exposure to ambient pollution. The association between community average pollution levels and health is determined after controlling for risk factors of the health outcome measured at the individual level (i.e., smoking). We present a new spatial regression model linking spatial variation in ambient air pollution to health. Health outcomes can be measured as continuous variables (pulmonary function), binary variables (prevalence of disease), or time-to-event data (survival or development of disease). The model incorporates risk factors measured at the individual level, such as smoking, and at the community level, such as air pollution. We demonstrate that the spatial autocorrelation in community health outcomes, an indication of not fully characterizing potentially confounding risk factors to the air pollution--health association, can be accounted for through the inclusion of location in the deterministic component of the model assessing the effects of air pollution on health or through a distance-decay spatial autocorrelation function in the stochastic component of the model, or both. We present a statistical approach that can be implemented for very large cohort studies. Our methods are illustrated with an analysis of the American Cancer Society cohort to determine whether the prevalence of heart disease is associated with concentrations of sulfate particles. From a statistical point of view, it appears that a location surface in the deterministic component of the model was preferred to a distance-decay autocorrelation structure in the model's stochastic component.