The MR-Base platform supports systematic causal inference across the human phenome

Results from genome-wide association studies (GWAS) can be used to infer causal relationships between phenotypes, using a strategy known as 2-sample Mendelian randomization (2SMR) and bypassing the need for individual-level data. However, 2SMR methods are evolving rapidly and GWAS results are often insufficiently curated, undermining efficient implementation of the approach. We therefore developed MR-Base (http://www.mrbase.org): a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR. The software includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions. The database currently comprises 11 billion single nucleotide polymorphism-trait associations from 1673 GWAS and is updated on a regular basis. Integrating data with software ensures more rigorous application of hypothesis-driven analyses and allows millions of potential causal relationships to be efficiently evaluated in phenome-wide association studies.

[1]  Eric J Tchetgen Tchetgen,et al.  Methodological Challenges in Mendelian Randomization , 2014, Epidemiology.

[2]  N. Sheehan,et al.  Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I2 statistic , 2016, International journal of epidemiology.

[3]  Tanya M. Teslovich,et al.  Common variants associated with plasma triglycerides and risk for coronary artery disease , 2013, Nature Genetics.

[4]  Norbert Schuff,et al.  Genetic studies of plasma analytes identify novel potential biomarkers for several complex traits , 2016, Scientific Reports.

[5]  J. Ioannidis,et al.  Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials , 2011, BMJ : British Medical Journal.

[6]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[7]  Tanya M. Teslovich,et al.  Discovery and refinement of loci associated with lipid levels , 2013, Nature Genetics.

[8]  Dylan S. Small,et al.  Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score , 2018, The Annals of Statistics.

[9]  A. Price,et al.  Dissecting the genetics of complex traits using summary association statistics , 2016, Nature Reviews Genetics.

[10]  J. Danesh,et al.  A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease , 2016 .

[11]  Joseph K. Pickrell,et al.  Detection and interpretation of shared genetic influences on 42 human traits , 2015, Nature Genetics.

[12]  S. Humphries,et al.  Association of Lipid Fractions With Risks for Coronary Artery Disease and Diabetes. , 2016, JAMA cardiology.

[13]  P. Visscher,et al.  Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets , 2016, Nature Genetics.

[14]  Dylan S. Small,et al.  Instrumental Variables Estimation With Some Invalid Instruments and its Application to Mendelian Randomization , 2014, 1401.5755.

[15]  G. Davey Smith,et al.  Best (but oft-forgotten) practices: the design, analysis, and interpretation of Mendelian randomization studies1 , 2016, The American journal of clinical nutrition.

[16]  Jack Bowden,et al.  Improving the visualisation, interpretation and analysis of two-sample summary data Mendelian randomization via the radial plot and radial regression , 2017, bioRxiv.

[17]  Tom R. Gaunt,et al.  HMG-coenzyme A reductase inhibition, type 2 diabetes, and bodyweight: evidence from genetic analysis and randomised trials , 2015, The Lancet.

[18]  Mario Roederer,et al.  The Genetic Architecture of the Human Immune System: A Bioresource for Autoimmunity and Disease Pathogenesis , 2015, Cell.

[19]  Yasuo Ohashi,et al.  Statins and risk of incident diabetes: a collaborative meta-analysis of randomised statin trials , 2010, The Lancet.

[20]  T. Lehtimäki,et al.  Integrative approaches for large-scale transcriptome-wide association studies , 2015, Nature Genetics.

[21]  Donal N. Gorman,et al.  Using Multivariable Mendelian Randomization to Disentangle the Causal Effects of Lipid Fractions , 2014, PloS one.

[22]  Jack Bowden,et al.  Unbiased estimation of odds ratios: combining genomewide association scans with replication studies , 2009, Genetic epidemiology.

[23]  Tom R. Gaunt,et al.  Association Between Telomere Length and Risk of Cancer and Non-Neoplastic Diseases: A Mendelian Randomization Study , 2017 .

[24]  Fernando Pires Hartwig,et al.  Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption , 2017, bioRxiv.

[25]  Olena O Yavorska,et al.  MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data , 2017, International journal of epidemiology.

[26]  Stephen Burgess,et al.  PhenoScanner: a database of human genotype–phenotype associations , 2016, Bioinform..

[27]  George Davey Smith,et al.  Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology , 2008, Statistics in medicine.

[28]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[29]  N. Sheehan,et al.  A framework for the investigation of pleiotropy in two‐sample summary data Mendelian randomization , 2017, Statistics in medicine.

[30]  G. Davey Smith,et al.  Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression , 2015, International journal of epidemiology.

[31]  G. Davey Smith,et al.  Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator , 2016, Genetic epidemiology.

[32]  N. Timpson,et al.  MR-PheWAS: hypothesis prioritization among potential causal effects of body mass index on many outcomes, using Mendelian randomization , 2015, Scientific Reports.

[33]  Joshua D. Angrist,et al.  Split-Sample Instrumental Variables Estimates of the Return to Schooling , 1995 .

[34]  Pak Chung Sham,et al.  GWASdb v2: an update database for human genetic variants identified by genome-wide association studies , 2015, Nucleic Acids Res..

[35]  P. Sham,et al.  GWASdb v 2 : an update database for human genetic variants identified by genome-wide association studies , 2015 .

[36]  G. Davey Smith,et al.  Epidemiology--is it time to call it a day? , 2001, International journal of epidemiology.

[37]  Matti Pirinen,et al.  FINEMAP: efficient variable selection using summary data from genome-wide association studies , 2015, bioRxiv.

[38]  Tom R. Gaunt,et al.  LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis , 2016, bioRxiv.

[39]  Andres Metspalu,et al.  Genome-Wide Association Analyses in 128,266 Individuals Identifies New Morningness and Sleep Duration Loci , 2016, PLoS genetics.

[40]  Brian A Ference,et al.  Association Between Lowering LDL-C and Cardiovascular Risk Reduction Among Different Therapeutic Interventions: A Systematic Review and Meta-analysis. , 2016, JAMA.

[41]  Jon White,et al.  Selecting instruments for Mendelian randomization in the wake of genome-wide association studies , 2016, International journal of epidemiology.

[42]  F. Dudbridge Power and Predictive Accuracy of Polygenic Risk Scores , 2013, PLoS genetics.

[43]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[44]  Jack Euesden,et al.  PRSice: Polygenic Risk Score software , 2014, Bioinform..

[45]  J. Angrist,et al.  Estimating the Payoff to Schooling Using the Vietnam-Era Draft Lottery , 1991 .

[46]  F. Hartwig,et al.  Two-sample Mendelian randomization: avoiding the downsides of a powerful, widely applicable but potentially fallible technique , 2016, International journal of epidemiology.

[47]  George Davey Smith,et al.  Recent Developments in Mendelian Randomization Studies , 2017, Current Epidemiology Reports.

[48]  Debbie A Lawlor,et al.  Triangulation in aetiological epidemiology , 2016, International journal of epidemiology.

[49]  M. Pirinen,et al.  Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA , 2016, Nature Communications.

[50]  Tom R. Gaunt,et al.  Systematic identification of genetic influences on methylation across the human life course , 2016, Genome Biology.

[51]  S. Ebrahim,et al.  'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? , 2003, International journal of epidemiology.

[52]  G. Moneta,et al.  Major Lipids, Apolipoproteins, and Risk of Vascular Disease , 2010 .

[53]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[54]  Anna Murray,et al.  Variants in the FTO and CDKAL1 loci have recessive effects on risk of obesity and type 2 diabetes, respectively , 2015, bioRxiv.

[55]  M. Daly,et al.  An Atlas of Genetic Correlations across Human Diseases and Traits , 2015, Nature Genetics.

[56]  C. Wallace,et al.  Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics , 2013, PLoS genetics.

[57]  Sylvia Richardson,et al.  JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects , 2016, Genetic epidemiology.

[58]  M. Munafo,et al.  Robust research needs many lines of evidence , 2018, Nature.

[59]  Tom R. Gaunt,et al.  LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis , 2016 .

[60]  J. Pritchard,et al.  Overcoming the winner's curse: estimating penetrance parameters from case-control data. , 2007, American journal of human genetics.

[61]  S. Thompson,et al.  Avoiding bias from weak instruments in Mendelian randomization studies. , 2011, International journal of epidemiology.

[62]  Tom R. Gaunt,et al.  Mendelian Randomization Analysis Identifies CpG Sites as Putative Mediators for Genetic Influences on Cardiovascular Disease Risk , 2017, American journal of human genetics.

[63]  Tom R. Gaunt,et al.  Automating Mendelian randomization through machine learning to construct a putative causal map of the human phenome , 2017, bioRxiv.

[64]  G. Smith,et al.  Mendelian randomization in cardiometabolic disease: challenges in evaluating causality , 2017, Nature Reviews Cardiology.

[65]  Tetsuro Ohmori,et al.  Retraction: A significant causal association between C-reactive protein levels and schizophrenia , 2018, Scientific Reports.

[66]  B. Horta,et al.  Inflammatory Biomarkers and Risk of Schizophrenia , 2017, JAMA psychiatry.

[67]  P. Donnelly,et al.  Genome-wide genetic data on ~500,000 UK Biobank participants , 2017, bioRxiv.

[68]  John P. Overington,et al.  An atlas of genetic influences on human blood metabolites , 2014, Nature Genetics.

[69]  G. Davey Smith,et al.  Orienting the causal relationship between imprecisely measured traits using GWAS summary data , 2017, PLoS genetics.

[70]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[71]  G. Davey Smith,et al.  Mendelian randomization: genetic anchors for causal inference in epidemiological studies , 2014, Human molecular genetics.

[72]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[73]  Luigi Ferrucci,et al.  Human longevity is influenced by many genetic variants: evidence from 75,000 UK Biobank participants , 2016, bioRxiv.

[74]  Marylyn D. Ritchie,et al.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations , 2010, Bioinform..

[75]  M. O’Donovan,et al.  Pleiotropic effects of trait-associated genetic variation on DNA methylation: utility for refining , 2018 .

[76]  B. Neale,et al.  Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases , 2018, Nature Genetics.

[77]  Neil M Davies,et al.  Software Application Profile: PHESANT: a tool for performing automated phenome scans in UK Biobank , 2017, International journal of epidemiology.

[78]  J. Thompson,et al.  Beyond Mendelian randomization: how to interpret evidence of shared genetic predictors , 2016, Journal of clinical epidemiology.

[79]  A. Brookes,et al.  GWAS Central: a comprehensive resource for the comparison and interrogation of genome-wide association studies , 2013, European Journal of Human Genetics.

[80]  B. Pierce,et al.  Efficient Design for Mendelian Randomization Studies: Subsample and 2-Sample Instrumental Variable Estimators , 2013, American journal of epidemiology.

[81]  Hynek Pikhart,et al.  HMG-coenzyme A reductase inhibition, type 2 diabetes, and bodyweight: evidence from genetic analysis and randomised trials , 2015, The Lancet.

[82]  Hynek Pikhart,et al.  PCSK9 genetic variants and risk of type 2 diabetes: a mendelian randomisation study , 2017, The lancet. Diabetes & endocrinology.