Statistical Approaches to Address Multi-Pollutant Mixtures and Multiple Exposures: the State of the Science

Purpose of ReviewThe purpose of this review is to describe the most recent statistical approaches to estimate the effect of multi-pollutant mixtures or multiple correlated exposures on human health.Recent FindingsThe health effects of environmental chemicals or air pollutants have been widely described. Often, there exists a complex mixture of different substances, potentially highly correlated with each other and with other (environmental) stressors. Single-exposure approaches do not allow disentangling effects of individual factors and fail to detect potential interactions between exposures. In the last years, sophisticated methods have been developed to investigate the joint or independent health effects of multi-pollutant mixtures or multiple environmental exposures.SummaryA classification of the most recent methods is proposed. A non-technical description of each method is provided, together with epidemiological applications and operational details for implementation with standard software.

[1]  Howard H. Chang,et al.  Classification and regression trees for epidemiologic research: an air pollution example , 2014, Environmental Health.

[2]  Paul T. Spellman,et al.  Integrating biological knowledge into variable selection: an empirical Bayes approach with an application in cancer biology , 2011, BMC Bioinformatics.

[3]  Mark J van der Laan,et al.  Deletion/Substitution/Addition Algorithm in Learning with Applications in Genomics , 2004, Statistical applications in genetics and molecular biology.

[4]  A. Zuckerman,et al.  IARC Monographs on the Evaluation of Carcinogenic Risks to Humans , 1995, IARC monographs on the evaluation of carcinogenic risks to humans.

[5]  Paolo Vineis,et al.  A Systematic Comparison of Linear Regression–Based Statistical Methods to Assess Exposome-Health Associations , 2016, Environmental health perspectives.

[6]  Dan J Stein,et al.  Global, regional, and national comparative risk assessment of 79 behavioural, environmental and occupational, and metabolic risks or clusters of risks in 188 countries, 1990–2013: a systematic analysis for the Global Burden of Disease Study 2013 , 2015, BDJ.

[7]  R. Hillamo,et al.  Source-specific fine particulate air pollution and systemic inflammation in ischaemic heart disease patients , 2014, Occupational and Environmental Medicine.

[8]  Lars Lind,et al.  The identification of complex interactions in epidemiology and toxicology: a simulation study of boosted regression trees , 2014, Environmental Health.

[9]  Tiago M. Fragoso,et al.  Bayesian Model Averaging: A Systematic Review and Conceptual Classification , 2015, 1509.08864.

[10]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[11]  Robert Tibshirani,et al.  Hierarchical Clustering With Prototypes via Minimax Linkage , 2011, Journal of the American Statistical Association.

[12]  Francesca Dominici,et al.  A Bayesian Model Averaging Approach for Estimating the Relative Risk of Mortality Associated with Heat Waves in 105 U.S. Cities , 2011, Biometrics.

[13]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[14]  P. Vokonas,et al.  Fine particles, genetic pathways, and markers of inflammation and endothelial dysfunction: Analysis on particulate species and sources , 2016, Journal of Exposure Science and Environmental Epidemiology.

[15]  Chris Gennings,et al.  Characterization of Weighted Quantile Sum Regression for Highly Correlated Data in a Risk Analysis Setting , 2014, Journal of Agricultural, Biological, and Environmental Statistics.

[16]  Douglas Steinley,et al.  K-means clustering: a half-century synthesis. , 2006, The British journal of mathematical and statistical psychology.

[17]  X. Basagaña,et al.  Neurodevelopmental Deceleration by Urban Fine Particles from Different Emission Sources: A Longitudinal Observational Study , 2016, Environmental health perspectives.

[18]  Bhramar Mukherjee,et al.  Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons , 2013, Environmental Health.

[19]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[20]  David B. Dunson,et al.  Bayesian Methods for Highly Correlated Exposure Data , 2007, Epidemiology.

[21]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[22]  R. Clarke,et al.  Approaches to working in high-dimensional data spaces: gene expression microarrays , 2008, British Journal of Cancer.

[23]  Chris Gennings,et al.  Statistical Approaches for Assessing Health Effects of Environmental Chemical Mixtures in Epidemiology: Lessons from an Innovative Workshop , 2016, Environmental health perspectives.

[24]  C. O'Connor An introduction to multivariate statistical analysis: 2nd edn. by T. W. Anderson. 675 pp. Wiley, New York (1984) , 1987 .

[25]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[26]  S. Keleş,et al.  Sparse partial least squares regression for simultaneous dimension reduction and variable selection , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[27]  Aldert H Piersma,et al.  Prenatal Phthalate, Perfluoroalkyl Acid, and Organochlorine Exposures and Term Birth Weight in Three Birth Cohorts: Multi-Pollutant Models Based on Elastic Net Regression , 2015, Environmental health perspectives.

[28]  T. Gasser,et al.  Nonparametric Density Estimation under Unimodality and Monotonicity Constraints , 1999 .

[29]  Meng Wang,et al.  The Association between Ambient Air Pollution and Daily Mortality in Beijing after the 2008 Olympics: A Time Series Study , 2013, PloS one.

[30]  Halûk Özkaynak,et al.  Is the air pollution health research community prepared to support a multipollutant air quality management framework? , 2010, Inhalation toxicology.

[31]  Annette M. Molinaro,et al.  partDSA: deletion/substitution/addition algorithm for partitioning the covariate space in prediction , 2010, Bioinform..

[32]  Gary W. Fuller,et al.  Analysing the health effects of simultaneous exposure to physical and chemical properties of airborne particles , 2015, Environment international.

[33]  Eun Sug Park,et al.  Part 2. Development of Enhanced Statistical Methods for Assessing Health Effects Associated with an Unknown Number of Major Sources of Multiple Air Pollutants. , 2015, Research report.

[34]  Steven Roberts,et al.  Using Supervised Principal Components Analysis to Assess Multiple Pollutant Effects , 2006, Environmental health perspectives.

[35]  Daniel J. Bauer,et al.  Modeling complex interactions: Person–centered and variable–centered approaches , 2012 .

[36]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[37]  Jenna R. Krall,et al.  Associations between Source-Specific Fine Particulate Matter and Emergency Department Visits for Respiratory Disease in Four U.S. Cities , 2016, Environmental health perspectives.

[38]  Paolo Vineis,et al.  Examining the Joint Effect of Multiple Risk Factors Using Exposure Risk Profiles: Lung Cancer in Nonsmokers , 2010, Environmental health perspectives.

[39]  James R. Cerhan,et al.  Analysis of Environmental Chemical Mixtures and Non-Hodgkin Lymphoma Risk in the NCI-SEER NHL Study , 2015, Environmental health perspectives.

[40]  Christopher D. Barr,et al.  Protecting Human Health From Air Pollution: Shifting From a Single-pollutant to a Multipollutant Approach , 2010, Epidemiology.

[41]  David C Christiani,et al.  Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. , 2015, Biostatistics.

[42]  Daniel L. Costa,et al.  Practical Advancement of Multipollutant Scientific and Risk Assessment Approaches for Ambient Air Pollution , 2012, Environmental health perspectives.

[43]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[44]  J. Schwartz,et al.  The Impact of Multipollutant Clusters on the Association Between Fine Particulate Air Pollution and Microvascular Function , 2015, Epidemiology.

[45]  Peter Bühlmann Regression shrinkage and selection via the Lasso: a retrospective (Robert Tibshirani): Comments on the presentation , 2011 .

[46]  C. Wild Complementing the Genome with an “Exposome”: The Outstanding Challenge of Environmental Exposure Measurement in Molecular Epidemiology , 2005, Cancer Epidemiology Biomarkers & Prevention.

[47]  Hanwen Huang Controlling the false discoveries in LASSO , 2017, Biometrics.

[48]  Robert Tibshirani,et al.  Sparse regression and marginal testing using cluster prototypes. , 2015, Biostatistics.

[49]  T. Hastie,et al.  Learning Interactions via Hierarchical Group-Lasso Regularization , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[50]  Isabella Annesi-Maesano,et al.  Estimating the health effects of exposure to multi-pollutant mixture. , 2012, Annals of epidemiology.

[51]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[52]  Jenna R. Krall,et al.  Recent Approaches to Estimate Associations Between Source-Specific Air Pollution and Health , 2017, Current Environmental Health Reports.

[53]  Marc Chadeau-Hyam,et al.  R2GUESS: A Graphics Processing Unit-Based R Package for Bayesian Variable Selection Regression of Multivariate Responses. , 2016, Journal of statistical software.

[54]  Zev Ross,et al.  Application of the deletion/substitution/addition algorithm to selecting land use regression models for interpolating air pollution measurements in California , 2013 .

[55]  D. Jacobs,et al.  Low Dose of Some Persistent Organic Pollutants Predicts Type 2 Diabetes: A Nested Case–Control Study , 2010, Environmental health perspectives.

[56]  Miquel Porta,et al.  Number of Persistent Organic Pollutants Detected at High Concentrations in Blood Samples of the United States Population , 2016, PloS one.

[57]  J. Lelieveld,et al.  The contribution of outdoor air pollution sources to premature mortality on a global scale , 2015, Nature.

[58]  Christopher F. Parmeter,et al.  Bayesian Model Averaging in R , 2011 .

[59]  D. Jacobs,et al.  A Strong Dose-Response Relation Between Serum Concentrations of Persistent Organic Pollutants and Diabetes , 2006, Diabetes Care.

[60]  Brent A. Coull,et al.  Use of the Adaptive LASSO Method to Identify PM2.5 Components Associated with Blood Pressure in Elderly Men: The Veterans Affairs Normative Aging Study , 2015, Environmental health perspectives.

[61]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[62]  Ashutosh Kumar Singh,et al.  Global, regional, and national comparative risk assessment of 79 behavioural, environmental and occupational, and metabolic risks or clusters of risks, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015 , 2016, Lancet.

[63]  Sylvia Richardson,et al.  Bayesian profile regression with an application to the National Survey of Children's Health. , 2010, Biostatistics.

[64]  P. Paatero The Multilinear Engine—A Table-Driven, Least Squares Program for Solving Multilinear Problems, Including the n-Way Parallel Factor Analysis Model , 1999 .

[65]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[66]  Alan D. Lopez,et al.  The Global Burden of Disease Study , 2003 .

[67]  Howard H. Chang,et al.  Ensemble-based source apportionment of fine particulate matter and emergency department visits for pediatric asthma. , 2015, American journal of epidemiology.