Bayesian weighted Mendelian randomization for causal inference based on summary statistics

MOTIVATION The results from Genome-Wide Association Studies (GWAS) on thousands of phenotypes provide an unprecedented opportunity to infer the causal effect of one phenotype (exposure) on another (outcome). Mendelian randomization (MR), an instrumental variable (IV) method, has been introduced for causal inference using GWAS data. Due to the polygenic architecture of complex traits/diseases and the ubiquity of pleiotropy, however, MR has many unique challenges compared to conventional IV methods. RESULTS We propose a Bayesian weighted Mendelian randomization (BWMR) for causal inference to address these challenges. In our BWMR model, the uncertainty of weak effects owing to polygenicity has been taken into account and the violation of IV assumption due to pleiotropy has been addressed through outlier detection by Bayesian weighting. To make the causal inference based on BWMR computationally stable and efficient, we developed a variational expectation-maximization (VEM) algorithm. Moreover, we have also derived an exact closed-form formula to correct the posterior covariance which is often underestimated in variational inference. Through comprehensive simulation studies, we evaluated the performance of BWMR, demonstrating the advantage of BWMR over its competitors. Then we applied BWMR to make causal inference between 130 metabolites and 93 complex human traits, uncovering novel causal relationship between exposure and outcome traits. AVAILABILITY The BWMR software is available at https://github.com/jiazhao97/BWMR.

[1]  Raymond Walters,et al.  Significant Locus and Metabolic Genetic Correlations Revealed in Genome-Wide Association Study of Anorexia Nervosa. , 2017, The American journal of psychiatry.

[2]  Florence Demenais,et al.  A large-scale, consortium-based genomewide association study of asthma. , 2010, The New England journal of medicine.

[3]  P. Visscher,et al.  Cross-ethnic meta-analysis identifies association of the GPX3-TNIP1 locus with amyotrophic lateral sclerosis , 2017, Nature Communications.

[4]  Sylvia Stracke,et al.  Genome-wide Association Studies Identify Genetic Loci Associated With Albuminuria in Diabetes , 2015, Diabetes.

[5]  Ellen M. Schmidt,et al.  New loci for body fat percentage reveal link between adiposity and cardiometabolic disease risk , 2016, Nature Communications.

[6]  David M. Blei,et al.  Reweighted Data for Robust Probabilistic Models , 2016, ArXiv.

[7]  Qian Wang,et al.  Pervasive pleiotropy between psychiatric disorders and immune disorders revealed by integrative analysis of multiple GWAS , 2015, Human Genetics.

[8]  Casey S. Greene,et al.  International genome-wide meta-analysis identifies new primary biliary cirrhosis risk loci and targetable pathogenic pathways , 2015, Nature Communications.

[9]  Kyle J. Gaulton,et al.  Genome-wide associations for birth weight and correlations with adult disease , 2016 .

[10]  S. Greenland An introduction To instrumental variables for epidemiologists , 2000, International journal of epidemiology.

[11]  Nicholas J Timpson,et al.  A genome‐wide approach to children's aggressive behavior: The EAGLE consortium , 2016, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[12]  A. Butterworth,et al.  Mendelian Randomization Analysis With Multiple Genetic Variants Using Summarized Data , 2013, Genetic epidemiology.

[13]  Nick C Fox,et al.  Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease , 2013, Nature Genetics.

[14]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[15]  Robert Tibshirani,et al.  A Simple Method for the Adjustment of Profile Likelihoods , 1990 .

[16]  Tom R. Gaunt,et al.  Genome-wide meta-analysis uncovers novel loci influencing circulating leptin levels , 2016, Nature Communications.

[17]  S. Ebrahim,et al.  Mendelian randomization: prospects, potentials, and limitations. , 2004, International journal of epidemiology.

[18]  J. Rioux,et al.  Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus , 2015, Nature Genetics.

[19]  Andrew D. Johnson,et al.  Parent-of-origin specific allelic associations among 106 genomic loci for age at menarche , 2014, Nature.

[20]  M. Baiocchi,et al.  Instrumental variable methods for causal inference , 2014, Statistics in medicine.

[21]  Tamara S. Roman,et al.  New genetic loci link adipose and insulin biology to body fat distribution , 2014, Nature.

[22]  Robert Plomin,et al.  Erratum: Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence , 2017, Nature Genetics.

[23]  P. Deloukas,et al.  Multiple common variants for celiac disease influencing immune gene expression , 2010, Nature Genetics.

[24]  J. Cacioppo,et al.  Genome-Wide Association Study of Loneliness Demonstrates a Role for Common Variation , 2017, Neuropsychopharmacology.

[25]  R. Krauss,et al.  Low-density lipoproteins cause atherosclerotic cardiovascular disease , 2018 .

[26]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[27]  Dylan S. Small,et al.  Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score , 2018, The Annals of Statistics.

[28]  H. Stefánsson,et al.  Genome-wide association analysis of insomnia complaints identifies risk genes and genetic overlap with psychiatric and metabolic traits , 2017, Nature Genetics.

[29]  Jonathan P. Beauchamp,et al.  GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment , 2013, Science.

[30]  Claude Bouchard,et al.  Identification of heart rate-associated loci and their effects on cardiac conduction and rhythm disorders , 2014 .

[31]  George Davey Smith,et al.  Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology , 2008, Statistics in medicine.

[32]  Ross M. Fraser,et al.  Defining the role of common variation in the genomic and biological architecture of adult human height , 2014, Nature Genetics.

[33]  Christian Gieger,et al.  Impact of common genetic determinants of Hemoglobin A1c on type 2 diabetes risk and diagnosis in ancestrally diverse populations: A transethnic genome-wide meta-analysis , 2017, PLoS Medicine.

[34]  Andrew D. Johnson,et al.  Nature Genetics Advance Online Publication Large-scale Genomic Analyses Link Reproductive Aging to Hypothalamic Signaling, Breast Cancer Susceptibility and Brca1-mediated Dna Repair , 2022 .

[35]  Peter K. Joshi,et al.  KLB is associated with alcohol drinking, and its gene product β-Klotho is necessary for FGF21 regulation of alcohol preference , 2016, Proceedings of the National Academy of Sciences.

[36]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[37]  N. Wray,et al.  Meta-analysis of genome-wide association studies of anxiety disorders , 2016, Molecular Psychiatry.

[38]  W. Willett,et al.  A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer , 2007, Nature Genetics.

[39]  Hongyu Zhao,et al.  GPA: A Statistical Approach to Prioritizing GWAS Results by Integrating Pleiotropy and Annotation , 2014, PLoS genetics.

[40]  Andres Metspalu,et al.  Genome-Wide Association Analyses in 128,266 Individuals Identifies New Morningness and Sleep Duration Loci , 2016, PLoS genetics.

[41]  Tanya M. Teslovich,et al.  Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes , 2012, Nature Genetics.

[42]  Y Wang,et al.  Genome-wide association study of obsessive-compulsive disorder , 2013, Molecular Psychiatry.

[43]  David M. Blei,et al.  Robust Probabilistic Modeling with Bayesian Data Reweighting , 2016, ICML.

[44]  Lorna M. Lopez,et al.  Meta-analysis of genome-wide association studies for personality , 2012, Molecular Psychiatry.

[45]  Tanya M. Teslovich,et al.  Discovery and refinement of loci associated with lipid levels , 2013, Nature Genetics.

[46]  Michael I. Jordan,et al.  Linear Response Methods for Accurate Covariance Estimates from Mean Field Variational Bayes , 2015, NIPS.

[47]  Nathan A. Bihlmeyer,et al.  SOS2 and ACP1 Loci Identified through Large-Scale Exome Chip Analysis Regulate Kidney Development and Function. , 2017, Journal of the American Society of Nephrology : JASN.

[48]  J. Danesh,et al.  A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease , 2016 .

[49]  Kathleen F. Kerr,et al.  Genetic loci associated with heart rate variability and their effects on cardiac disease risk , 2017, Nature Communications.

[50]  Robert M. Maier,et al.  Causal associations between risk factors and common diseases inferred from GWAS summary data , 2017, Nature Communications.

[51]  Fabian J Theis,et al.  Genome-wide association analyses identify 18 new loci associated with serum urate concentrations , 2012, Nature Genetics.

[52]  Qian Wang,et al.  Implications of pleiotropy: challenges and opportunities for mining Big Data in biomedicine , 2015, Front. Genet..

[53]  Michael I. Jordan,et al.  Covariances, Robustness, and Variational Bayes , 2017, J. Mach. Learn. Res..

[54]  S. Purcell,et al.  Pleiotropy in complex traits: challenges and strategies , 2013, Nature Reviews Genetics.

[55]  Stephen Burgess,et al.  Sensitivity Analyses for Robust Causal Inference from Mendelian Randomization Analyses with Multiple Genetic Variants , 2016, Epidemiology.

[56]  M. Daly,et al.  Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis , 2013, The Lancet.

[57]  Scott M. Williams,et al.  The ubiquity of pleiotropy in human disease , 2017, Human Genetics.

[58]  Claude Bouchard,et al.  Meta-analysis identifies common and rare variants influencing blood pressure and overlapping with metabolic trait loci , 2016, Nature Genetics.

[59]  Claude Bouchard,et al.  A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance , 2012, Nature Genetics.

[60]  Ross M. Fraser,et al.  Genetic studies of body mass index yield new insights for obesity biology , 2015, Nature.

[61]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[62]  T. Funahashi,et al.  Diagnostic criteria for dyslipidemia. Executive summary of Japan Atherosclerosis Society (JAS) guideline for diagnosis and prevention of atherosclerotic cardiovascular diseases for Japanese. , 2007, Journal of atherosclerosis and thrombosis.

[63]  Valeriia Haberland,et al.  The MR-Base platform supports systematic causal inference across the human phenome , 2018, eLife.

[64]  Manuel A. R. Ferreira,et al.  Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4 , 2011, Nature Genetics.

[65]  M. Daly,et al.  Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis , 2013, The Lancet.

[66]  Jonathan P. Beauchamp,et al.  Genetic variants associated with subjective well-being, depressive symptoms and neuroticism identified through genome-wide analyses , 2016, Nature Genetics.

[67]  M. Pirinen,et al.  Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA , 2016, Nature Communications.

[68]  Aad van der Lugt,et al.  Common variants at 12q15 and 12q24 are associated with infant head circumference , 2012, Nature Genetics.

[69]  R. Prentice,et al.  Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. , 2008, Biostatistics.

[70]  J. Todd,et al.  Childhood adiposity and risk of type 1 diabetes: A Mendelian randomization study , 2017, PLoS medicine.

[71]  Manuel A. R. Ferreira,et al.  Multi-ethnic genome-wide association study of 21,000 cases and 95,000 controls identifies new risk loci for atopic dermatitis , 2015, Nature Genetics.

[72]  Inês Barroso,et al.  A genome-wide association meta-analysis identifies new childhood obesity loci , 2012, Nature Genetics.

[73]  David M. Evans,et al.  Mendelian Randomization: New Applications in the Coming Age of Hypothesis-Free Causality. , 2015, Annual review of genomics and human genetics.

[74]  T. Merriman,et al.  Relationship between serum urate concentration and clinically evident incident gout: an individual participant data analysis , 2018, Annals of the rheumatic diseases.

[75]  Ian J. Deary,et al.  Common genetic variants associated with cognitive performance identified using the proxy-phenotype method , 2014, Proceedings of the National Academy of Sciences.

[76]  B. Efron Tweedie’s Formula and Selection Bias , 2011, Journal of the American Statistical Association.

[77]  Tanya M. Teslovich,et al.  Genetics of Blood Lipids Among ~300,000 Multi-Ethnic Participants of the Million Veteran Program , 2018, Nature Genetics.

[78]  S. Kihara,et al.  Diagnostic criteria for dyslipidemia. , 2013, Journal of atherosclerosis and thrombosis.

[79]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[80]  M. Katan APOUPOPROTEIN E ISOFORMS, SERUM CHOLESTEROL, AND CANCER , 1986, The Lancet.

[81]  David M. Evans,et al.  A novel common variant in DCST2 is associated with length in early life and height in adulthood , 2014, Human molecular genetics.

[82]  Eden R Martin,et al.  Meta‐analysis of Parkinson's Disease: Identification of a novel locus, RIT2 , 2012, Annals of neurology.

[83]  B. Neale,et al.  Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases , 2018, Nature Genetics.

[84]  N. Sheehan,et al.  A framework for the investigation of pleiotropy in two‐sample summary data Mendelian randomization , 2017, Statistics in medicine.

[85]  G. Davey Smith,et al.  Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression , 2015, International journal of epidemiology.

[86]  Judy H. Cho,et al.  Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations , 2015, Nature Genetics.

[87]  E. Oetjen,et al.  Genome-Wide Association Identifies Nine Common Variants Associated With Fasting Proinsulin Levels and Provides New Insights Into the Pathophysiology of Type 2 Diabetes , 2011, Diabetes.

[88]  Simon C. Potter,et al.  Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis , 2011, Nature.

[89]  Christian Gieger,et al.  New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk , 2010, Nature Genetics.

[90]  Frank Windmeijer,et al.  Instrumental Variable Estimators for Binary Outcomes , 2009 .

[91]  Lale Tokgözoğlu,et al.  Low-density lipoproteins cause atherosclerotic cardiovascular disease. 1. Evidence from genetic, epidemiologic, and clinical studies. A consensus statement from the European Atherosclerosis Society Consensus Panel , 2017, European heart journal.