A hierarchical integrative group least absolute shrinkage and selection operator for analyzing environmental mixtures

Environmental health studies are increasingly measuring multiple pollutants to characterize the joint health effects attributable to exposure mixtures. However, the underlying dose-response relationship between toxicants and health outcomes of interest may be highly nonlinear, with possible nonlinear interaction effects. Existing penalized regression methods that account for exposure interactions either cannot accommodate nonlinear interactions while maintaining strong heredity or are computationally unstable in applications with limited sample size. In this paper, we propose a general shrinkage and selection framework to identify noteworthy nonlinear main and interaction effects among a set of exposures. We design hierarchical integrative group LASSO (HiGLASSO) to (a) impose strong heredity constraints on two-way interaction effects (hierarchical), (b) incorporate adaptive weights without necessitating initial coefficient estimates (integrative), and (c) induce sparsity for variable selection while respecting group structure (group LASSO). We prove sparsistency of the proposed method and apply HiGLASSO to an environmental toxicants dataset from the LIFECODES birth cohort, where the investigators are interested in understanding the joint effects of 21 urinary toxicant biomarkers on urinary 8-isoprostane, a measure of oxidative stress. An implementation of HiGLASSO is available in the higlasso R package, accessible through the Comprehensive R Archive Network.

[1]  Bhramar Mukherjee,et al.  Repeated measures of urinary oxidative stress biomarkers during pregnancy and preterm birth. , 2015, American journal of obstetrics and gynecology.

[2]  A. Alshawabkeh,et al.  Associations between urinary phenol and paraben concentrations and markers of oxidative stress and inflammation among pregnant women in Puerto Rico. , 2015, International journal of hygiene and environmental health.

[3]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[4]  T. Hastie,et al.  Learning Interactions via Hierarchical Group-Lasso Regularization , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[5]  J. Nelder A Reformulation of Linear Models , 1977 .

[6]  Li Cai,et al.  Consequences of Unmodeled Nonlinear Effects in Multilevel Models , 2009 .

[7]  Ning Hao,et al.  Interaction Screening for Ultra-High Dimensional Data. , 2014, Journal of the American Statistical Association.

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  Bhramar Mukherjee,et al.  Urinary Phthalate Metabolites and Biomarkers of Oxidative Stress in Pregnant Women: A Repeated Measures Analysis , 2014, Environmental health perspectives.

[10]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[11]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[12]  Gareth M. James,et al.  Variable Selection Using Adaptive Nonlinear Interaction Structures in High Dimensions , 2010 .

[13]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[14]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[15]  Yujun Wu,et al.  Fast FSR Variable Selection with Applications to Clinical Trials , 2009, Biometrics.

[16]  Qing Pan,et al.  Integrative weighted group lasso and generalized local quadratic approximation , 2016, Comput. Stat. Data Anal..

[17]  John D. Meeker,et al.  Exploration of oxidative stress and inflammatory markers in relation to urinary phthalate metabolites: NHANES 1999-2006. , 2012, Environmental science & technology.

[18]  Peter Kraft,et al.  Gene-environment interactions in genome-wide association studies: a comparative study of tests applied to empirical studies of type 2 diabetes. , 2012, American journal of epidemiology.

[19]  Bhramar Mukherjee,et al.  Environmental phenol associations with ultrasound and delivery measures of fetal growth. , 2018, Environment international.

[20]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[21]  Hao Helen Zhang,et al.  Adaptive Lasso for Cox's proportional hazards model , 2007 .

[22]  T. Schettler,et al.  Human exposure to phthalates via consumer products. , 2006, International journal of andrology.

[23]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[24]  Hansheng Wang,et al.  Computational Statistics and Data Analysis a Note on Adaptive Group Lasso , 2022 .

[25]  Ji Zhu,et al.  Variable Selection With the Strong Heredity Constraint and Its Oracle Property , 2010 .

[26]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[27]  L. Armijo Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[28]  R. Tibshirani,et al.  A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[29]  Richard Gonzalez,et al.  Selection of nonlinear interactions by a forward stepwise algorithm: Application to identifying environmental chemical mixtures affecting health outcomes , 2018, Statistics in medicine.

[30]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[31]  Subhashis Ghosal,et al.  Prediction consistency of forward iterated regression and selection technique , 2015 .

[32]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[33]  Hansheng Wang,et al.  Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso , 2007 .

[34]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[35]  David C Christiani,et al.  Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. , 2015, Biostatistics.

[36]  Bhramar Mukherjee,et al.  Set-Based Tests for the Gene–Environment Interaction in Longitudinal Studies , 2017, Journal of the American Statistical Association.

[37]  Dennis D. Boos,et al.  FSR methods for second-order regression models , 2011, Comput. Stat. Data Anal..

[38]  W. Crinnion,et al.  The CDC fourth national report on human exposure to environmental chemicals: what it tells us about our toxic burden and how it assist environmental medicine physicians. , 2010, Alternative medicine review : a journal of clinical therapeutic.

[39]  Ning Hao,et al.  Interaction Screening for Ultrahigh-Dimensional Data , 2014, Journal of the American Statistical Association.

[40]  John D Meeker,et al.  Urinary phthalate metabolites and their biotransformation products: predictors and temporal variability among men and women , 2012, Journal of Exposure Science and Environmental Epidemiology.

[41]  K. Roeder,et al.  Screen and clean: a tool for identifying interactions in genome‐wide association studies , 2010, Genetic epidemiology.

[42]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[43]  P. Montuschi,et al.  Increased 8-isoprostane, a marker of oxidative stress, in exhaled condensate of asthma patients. , 1999, American journal of respiratory and critical care medicine.

[44]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[45]  P. Darbre,et al.  Paraben esters: review of recent studies of endocrine toxicity, absorption, esterase and human exposure, and discussion of potential human health risks , 2008, Journal of applied toxicology : JAT.

[46]  G. Wahba,et al.  A NOTE ON THE LASSO AND RELATED PROCEDURES IN MODEL SELECTION , 2006 .

[47]  Kelly K Ferguson,et al.  Urinary phthalate metabolites in relation to biomarkers of inflammation and oxidative stress: NHANES 1999-2006. , 2011, Environmental research.

[48]  Jaeil Ahn,et al.  Testing gene-environment interaction in large-scale case-control association studies: possible choices and comparisons. , 2012, American journal of epidemiology.

[49]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[50]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..