Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.

INTRODUCTION The United States Environmental Protection Agency (U.S. EPA*) currently regulates individual air pollutants on a pollutant-by-pollutant basis, adjusted for other pollutants and potential confounders. However, the National Academies of Science concluded that a multipollutant regulatory approach that takes into account the joint effects of multiple constituents is likely to be more protective of human health. Unfortunately, the large majority of existing research had focused on health effects of air pollution for one pollutant or for one pollutant with control for the independent effects of a small number of copollutants. Limitations in existing statistical methods are at least partially responsible for this lack of information on joint effects. The goal of this project was to fill this gap by developing flexible statistical methods to estimate the joint effects of multiple pollutants, while allowing for potential nonlinear or nonadditive associations between a given pollutant and the health outcome of interest. METHODS We proposed Bayesian kernel machine regression (BKMR) methods as a way to simultaneously achieve the multifaceted goals of variable selection, flexible estimation of the exposure-response relationship, and inference on the strength of the association between individual pollutants and health outcomes in a health effects analysis of mixtures. We first developed a BKMR variable-selection approach, which we call component-wise variable selection, to make estimating such a potentially complex exposure-response function possible by effectively using two types of penalization (or regularization) of the multivariate exposure-response surface. Next we developed an extension of this first variable-selection approach that incorporates knowledge about how pollutants might group together, such as multiple constituents of particulate matter that might represent a common pollution source category. This second grouped, or hierarchical, variable-selection procedure is applicable when groups of highly correlated pollutants are being studied. To investigate the properties of the proposed methods, we conducted three simulation studies designed to evaluate the ability of BKMR to estimate environmental mixtures responsible for health effects under potentially complex but plausible exposure-response relationships. An attractive feature of our simulation studies is that we used actual exposure data rather than simulated values. This real-data simulation approach allowed us to evaluate the performance of BKMR and several other models under realistic joint distributions of multipollutant exposure. The simulation studies compared the two proposed variable-selection approaches (component-wise and hierarchical variable selection) with each other and with existing frequentist treatments of kernel machine regression (KMR). After the simulation studies, we applied the newly developed methods to an epidemiologic data set and to a toxicologic data set. To illustrate the applicability of the proposed methods to human epidemiologic data, we estimated associations between short-term exposures to fine particulate matter constituents and blood pressure in the Maintenance of Balance, Independent Living, Intellect, and Zest in the Elderly (MOBILIZE) Boston study, a prospective cohort study of elderly subjects. To illustrate the applicability of these methods to animal toxicologic studies, we analyzed data on the associations between both blood pressure and heart rate in canines exposed to a composition of concentrated ambient particles (CAPs) in a study conducted at the Harvard T. H. Chan School of Public Health (the Harvard Chan School; formerly Harvard School of Public Health; Bartoli et al. 2009). RESULTS We successfully developed the theory and computational tools required to apply the proposed methods to the motivating data sets. Collectively, the three simulation studies showed that component-wise variable selection can identify important pollutants within a mixture as long as the correlations among pollutant concentrations are low to moderate. The hierarchical variable-selection method was more effective in high-dimension, high-correlation settings. Variable selection in existing frequentist KMR models can incur inflated type I error rates, particularly when pollutants are highly correlated. The analyses of the MOBILIZE data yielded evidence of a linear and additive association of black carbon (BC) or Cu exposure with standing diastolic blood pressure (DBP), and a linear association of S exposure with standing systolic blood pressure (SBP). Cu is thought to be a marker of urban road dust associated with traffic; and S is a marker of power plant emissions or regional long-range transported air pollution or both. Therefore, these analyses of the MOBILIZE data set suggest that emissions from these three source categories were most strongly associated with hemodynamic responses in this cohort. In contrast, in the Harvard Chan School canine study, after controlling for an overall effect of CAPs exposure, we did not observe any associations between DBP or SBP and any elemental concentrations. Instead, we observed strong evidence of an association between Mn concentrations and heart rate in that heart rate increased linearly with increasing concentrations of Mn. According to the positive matrix factorization (PMF) source apportionment analyses of the multipollutant data set from the Harvard Chan School Boston Supersite, Mn loads on the two factors that represent the mobile and road dust source categories. The results of the BKMR analyses in both the MOBILIZE and canine studies were similar to those from existing linear mixed model analyses of the same multipollutant data because the effects have linear and additive forms that could also have been detected using standard methods. CONCLUSIONS This work provides several contributions to the KMR literature. First, to our knowledge this is the first time KMR methods have been used to estimate the health effects of multipollutant mixtures. Second, we developed a novel hierarchical variable-selection approach within BKMR that is able to account for the structure of the mixture and systematically handle highly correlated exposures. The analyses of the epidemiologic and toxicologic data on associations between fine particulate matter constituents and blood pressure or heart rate demonstrated associations with constituents that are typically associated with traffic emissions, power plants, and long-range transported pollutants. The simulation studies showed that the BKMR methods proposed here work well for small to moderate data sets; more work is needed to develop computationally fast methods for large data sets. This will be a goal of future work.