Application of the deletion/substitution/addition algorithm to selecting land use regression models for interpolating air pollution measurements in California

Abstract Land use regression (LUR) models are widely employed in health studies to characterize chronic exposure to air pollution. The LUR is essentially an interpolation technique that employs the pollutant of interest as the dependent variable with proximate land use, traffic, and physical environmental variables used as independent predictors. Two major limitations with this method have not been addressed: (1) variable selection in the model building process, and (2) dealing with unbalanced repeated measures. In this paper, we address these issues with a modeling framework that implements the deletion/substitution/addition (DSA) machine learning algorithm that uses a generalized linear model to average over unbalanced temporal observations. Models were derived for fine particulate matter with aerodynamic diameter of 2.5 microns or less (PM 2.5 ) and nitrogen dioxide (NO 2 ) using monthly observations. We used 4119 observations at 108 sites and 15,301 observations at 138 sites for PM 2.5 and NO 2 , respectively. We derived models with good predictive capacity (cross-validated- R 2 values were 0.65 and 0.71 for PM 2.5 and NO 2 , respectively). By addressing these two shortcomings in current approaches to LUR modeling, we have developed a framework that minimizes arbitrary decisions during the model selection process. We have also demonstrated how to integrate temporally unbalanced data in a theoretically sound manner. These developments could have widespread applicability for future LUR modeling efforts.

[1]  Altaf Arain,et al.  A review and evaluation of intraurban air pollution exposure models , 2005, Journal of Exposure Analysis and Environmental Epidemiology.

[2]  M. Shima Health effects of traffic-related air pollution. , 2005 .

[3]  M. Brauer,et al.  Global Estimates of Ambient Fine Particulate Matter Concentrations from Satellite-Based Aerosol Optical Depth: Development and Application , 2010, Environmental health perspectives.

[4]  D. Jacob,et al.  Global modeling of tropospheric chemistry with assimilated meteorology : Model description and evaluation , 2001 .

[5]  Dan L. Crouse,et al.  A prediction-based approach to modelling temporal and spatial variability of traffic-related air pollution in Montreal, Canada , 2009 .

[6]  M. Brauer,et al.  Creating National Air Pollution Models for Population Exposure Assessment in Canada , 2011, Environmental health perspectives.

[7]  M. Jerrett,et al.  A distance-decay variable selection strategy for land use regression modeling of ambient air pollution exposures. , 2009, The Science of the total environment.

[8]  J. Gulliver,et al.  A review of land-use regression models to assess spatial variation of outdoor air pollution , 2008 .

[9]  Mark J van der Laan,et al.  Deletion/Substitution/Addition Algorithm in Learning with Applications in Genomics , 2004, Statistical applications in genetics and molecular biology.

[10]  N Künzli,et al.  A land use regression model for predicting ambient fine particulate matter across Los Angeles, CA. , 2007, Journal of environmental monitoring : JEM.

[11]  Zev Ross,et al.  Nitrogen dioxide prediction in Southern California using land use regression modeling: potential for environmental health analyses , 2006, Journal of Exposure Science and Environmental Epidemiology.

[12]  S. Keleş,et al.  Statistical Applications in Genetics and Molecular Biology Asymptotic Optimality of Likelihood-Based Cross-Validation , 2011 .

[13]  M. Brauer,et al.  Risk of Nonaccidental and Cardiovascular Mortality in Relation to Long-term Exposure to Low Concentrations of Fine Particulate Matter: A Canadian National-Level Cohort Study , 2012, Environmental health perspectives.