Characterization of Weighted Quantile Sum Regression for Highly Correlated Data in a Risk Analysis Setting

In risk evaluation, the effect of mixtures of environmental chemicals on a common adverse outcome is of interest. However, due to the high dimensionality and inherent correlations among chemicals that occur together, the traditional methods (e.g. ordinary or logistic regression) suffer from collinearity and variance inflation, and shrinkage methods have limitations in selecting among correlated components. We propose a weighted quantile sum (WQS) approach to estimating a body burden index, which identifies “bad actors” in a set of highly correlated environmental chemicals. We evaluate and characterize the accuracy of WQS regression in variable selection through extensive simulation studies through sensitivity and specificity (i.e., ability of the WQS method to select the bad actors correctly and not incorrect ones). We demonstrate the improvement in accuracy this method provides over traditional ordinary regression and shrinkage methods (lasso, adaptive lasso, and elastic net). Results from simulations demonstrate that WQS regression is accurate under some environmentally relevant conditions, but its accuracy decreases for a fixed correlation pattern as the association with a response variable diminishes. Nonzero weights (i.e., weights exceeding a selection threshold parameter) may be used to identify bad actors; however, components within a cluster of highly correlated active components tend to have lower weights, with the sum of their weights representative of the set.Supplementary materials accompanying this paper appear on-line.

[1]  Stephen M Rappaport,et al.  Environment and Disease Risks , 2010, Science.

[2]  R. Tibshirani,et al.  Combining Estimates in Regression and Classification , 1996 .

[3]  B. Brunekreef Exposure science, the exposome, and public health , 2013, Environmental and molecular mutagenesis.

[4]  Christopher D. Barr,et al.  Protecting Human Health From Air Pollution: Shifting From a Single-pollutant to a Multipollutant Approach , 2010, Epidemiology.

[5]  Chris Gennings,et al.  Multiple classes of environmental chemicals are associated with liver disease: NHANES 2003-2004. , 2013, International journal of hygiene and environmental health.

[6]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[7]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[8]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[9]  L. Breiman Stacked Regressions , 1996, Machine Learning.

[10]  A. Schecter,et al.  Phthalate Concentrations and Dietary Exposure from Food Purchased in New York State , 2013, Environmental health perspectives.

[11]  Steven Roberts,et al.  Investigating the mixture of air pollutants associated with adverse health outcomes , 2006 .

[12]  Joel D. Kaufman,et al.  What does multi-pollutant air pollution research mean? , 2011, American journal of respiratory and critical care medicine.

[13]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[14]  Yu-Kang Tu,et al.  Simpson's Paradox, Lord's Paradox, and Suppression Effects are the same phenomenon – the reversal paradox , 2008, Emerging themes in epidemiology.

[15]  J. Cerhan,et al.  Organochlorines in Carpet Dust and Non-Hodgkin Lymphoma , 2005, Epidemiology.

[16]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[17]  Y. Guyot Le commerce et les commerçants , 1909 .

[18]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[19]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[20]  Kyungho Choi,et al.  Urinary phthalate metabolites among elementary school children of Korea: sources, risks, and their association with oxidative stress marker. , 2014, The Science of the total environment.

[21]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[22]  K. Hungerbühler,et al.  What Are the Sources of Exposure to Eight Frequently Used Phthalic Acid Esters in Europeans? , 2006, Risk analysis : an official publication of the Society for Risk Analysis.

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[25]  Chris Gennings,et al.  Identifying Subsets of Complex Mixtures Most Associated With Complex Diseases: Polychlorinated Biphenyls and Endometriosis as a Case Study , 2010, Epidemiology.

[26]  Isabella Annesi-Maesano,et al.  Estimating the health effects of exposure to multi-pollutant mixture. , 2012, Annals of epidemiology.

[27]  A. Hansell,et al.  Traffic Air Pollution and Other Risk Factors for Respiratory Illness in Schoolchildren in the Niger-Delta Region of Nigeria , 2011, Environmental health perspectives.

[28]  C. Wild,et al.  The exposome: from concept to utility. , 2012, International journal of epidemiology.

[29]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[30]  C. Wild Complementing the Genome with an “Exposome”: The Outstanding Challenge of Environmental Exposure Measurement in Molecular Epidemiology , 2005, Cancer Epidemiology Biomarkers & Prevention.

[31]  John D. Meeker,et al.  Exploration of oxidative stress and inflammatory markers in relation to urinary phthalate metabolites: NHANES 1999-2006. , 2012, Environmental science & technology.

[32]  Rajeshwari Sundaram,et al.  The exposome--exciting opportunities for discoveries in reproductive and perinatal epidemiology. , 2013, Paediatric and perinatal epidemiology.