Estimating and testing direct genetic effects in directed acyclic graphs using estimating equations

In genetic association studies, it is important to distinguish direct and indirect genetic effects in order to build truly functional models. For this purpose, we consider a directed acyclic graph setting with genetic variants, primary and intermediate phenotypes, and confounding factors. In order to make valid statistical inference on direct genetic effects on the primary phenotype, it is necessary to consider all potential effects in the graph, and we propose to use the estimating equations method with robust Huber–White sandwich standard errors. We evaluate the proposed causal inference based on estimating equations (CIEE) method and compare it with traditional multiple regression methods, the structural equation modeling method, and sequential G‐estimation methods through a simulation study for the analysis of (completely observed) quantitative traits and time‐to‐event traits subject to censoring as primary phenotypes. The results show that CIEE provides valid estimators and inference by successfully removing the effect of intermediate phenotypes from the primary phenotype and is robust against measured and unmeasured confounding of the indirect effect through observed factors. All other methods except the sequential G‐estimation method for quantitative traits fail in some scenarios where their test statistics yield inflated type I errors. In the analysis of the Genetic Analysis Workshop 19 dataset, we estimate and test genetic effects on blood pressure accounting for intermediate gene expression phenotypes. The results show that CIEE can identify genetic variants that would be missed by traditional regression analyses. CIEE is computationally fast, widely applicable to different fields, and available as an R package.

[1]  Stijn Vansteelandt,et al.  Estimation of direct effects for survival data by using the Aalen additive hazards model , 2011 .

[2]  Christoph Lange,et al.  Inferring genetic causal effects on survival data with associated endo‐phenotypes , 2011, Genetic epidemiology.

[3]  Christopher A Hunter,et al.  Interleukin-27: balancing protective and pathological immunity. , 2012, Immunity.

[4]  Gordon Johnston,et al.  Statistical Models and Methods for Lifetime Data , 2003, Technometrics.

[5]  P. Rosenbaum The Consequences of Adjustment for a Concomitant Variable that Has Been Affected by the Treatment , 1984 .

[6]  Laura J Bierut,et al.  A multiancestry study identifies novel genetic associations with CHRNA5 methylation in human brain and risk of nicotine dependence. , 2015, Human molecular genetics.

[7]  J. Robins A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect , 1986 .

[8]  S. Vansteelandt,et al.  On the adjustment for covariates in genetic association analysis: a novel, simple principle to infer direct causal effects , 2009, Genetic epidemiology.

[9]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[10]  Ross M. Fraser,et al.  Genetic studies of body mass index yield new insights for obesity biology , 2015, Nature.

[11]  David C. Glahn,et al.  Omics-squared: human genomic, transcriptomic and phenotypic data for genetic analysis workshop 19 , 2016, BMC Proceedings.

[12]  Andreas Ritter,et al.  Structural Equations With Latent Variables , 2016 .

[13]  Hong-Wen Deng,et al.  Increased identification of novel variants in type 2 diabetes, birth weight and their pleiotropic loci , 2017, Journal of diabetes.

[14]  D. Gudbjartsson,et al.  Variants with large effects on blood lipids and the role of cholesterol and triglycerides in coronary disease , 2016, Nature Genetics.

[15]  J. Robins Estimation of the time-dependent accelerated failure time model in the presence of confounding factors , 1992 .

[16]  Joseph K. Pickrell,et al.  Detection and interpretation of shared genetic influences on 42 human traits , 2015, Nature Genetics.

[17]  Shelley B Bull,et al.  Bivariate genetic association analysis of systolic and diastolic blood pressure by copula models , 2014, BMC Proceedings.

[18]  Els Goetghebeur,et al.  Estimation of controlled direct effects , 2008 .

[19]  Yves Rosseel,et al.  lavaan: An R Package for Structural Equation Modeling , 2012 .

[20]  B. Efron Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods , 1981 .

[21]  G. Davey Smith,et al.  Two-step epigenetic Mendelian randomization: a strategy for establishing the causal role of epigenetic processes in pathways to disease. , 2012, International journal of epidemiology.

[22]  S. Cole,et al.  Fallibility in estimating direct effects. , 2002, International journal of epidemiology.

[23]  M. Fraga,et al.  Epigenetics and the environment: emerging patterns and implications , 2012, Nature Reviews Genetics.

[24]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[25]  Christoph Lange,et al.  CGene: an R package for implementation of causal genetic analyses , 2011, European Journal of Human Genetics.

[26]  J. Pearl Causal diagrams for empirical research , 1995 .

[27]  Stephen C. J. Parker,et al.  The genetic architecture of type 2 diabetes , 2016, Nature.

[28]  Stefan Konigorski,et al.  Genetic association analysis based on a joint model of gene expression and blood pressure , 2016, BMC Proceedings.

[29]  Inês Barroso,et al.  Genetic Predisposition to an Impaired Metabolism of the Branched-Chain Amino Acids and Risk of Type 2 Diabetes: A Mendelian Randomisation Analysis , 2016, PLoS medicine.

[30]  J. Robins,et al.  Adjusting for differential rates of prophylaxis therapy for PCP in high- versus low-dose AZT treatment arms in an AIDS randomized trial , 1994 .

[31]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[32]  Heather J. Cordell,et al.  Comparison of Methods to Account for Relatedness in Genome-Wide Association Studies with Family-Based Data , 2014, PLoS genetics.

[33]  George Davey Smith,et al.  Is epidemiology ready for epigenetics? , 2012, International journal of epidemiology.

[34]  Nuala A Sheehan,et al.  Adjusting for treatment effects in studies of quantitative traits: antihypertensive therapy and systolic blood pressure , 2005, Statistics in medicine.

[35]  Stijn Vansteelandt,et al.  Structural nested models and G-estimation: the partially realized promise , 2014, 1503.01589.

[36]  Claude Bouchard,et al.  A principal component meta-analysis on multiple anthropometric traits identifies novel loci for body shape , 2016, Nature communications.

[37]  Fredrick R. Schumacher,et al.  Modeling disease risk through analysis of physical interactions between genetic variants within chromatin regulatory circuitry , 2016, Nature Genetics.