EpiGraphDB: a database and data mining platform for health data science.

MOTIVATION The wealth of data resources on human phenotypes, risk factors, molecular traits and therapeutic interventions presents new opportunities for population health sciences. These opportunities are paralleled by a growing need for data integration, curation and mining to increase research efficiency, reduce mis-inference and ensure reproducible research. RESULTS We developed EpiGraphDB (https://epigraphdb.org/), a graph database containing an array of different biomedical and epidemiological relationships and an analytical platform to support their use in human population health data science. In addition, we present three case studies that illustrate the value of this platform. The first uses EpiGraphDB to evaluate potential pleiotropic relationships, addressing mis-inference in systematic causal analysis. In the second case study, we illustrate how protein-protein interaction data offer opportunities to identify new drug targets. The final case study integrates causal inference using Mendelian randomization with relationships mined from the biomedical literature to "triangulate" evidence from different sources. AVAILABILITY The EpiGraphDB platform is openly available at https://epigraphdb.org. Code for replicating case study results is available at https://github.com/MRCIEU/epigraphdb as Jupyter notebooks using the API, and https://mrcieu.github.io/epigraphdb-r using the R package. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  T. Celik,et al.  A new frame in thromboembolic cardiovascular disease: Adipocytokine. , 2010, International journal of cardiology.

[2]  R. Sartor,et al.  Cytokines in intestinal inflammation: pathophysiological and clinical considerations. , 1994, Gastroenterology.

[3]  Gerome Breen,et al.  Navigome: Navigating the Human Phenome , 2018, bioRxiv.

[4]  Debbie A Lawlor,et al.  Triangulation in aetiological epidemiology , 2016, International journal of epidemiology.

[5]  Dina Demner-Fushman,et al.  MetaMap Lite: an evaluation of a new Java implementation of MetaMap , 2017, J. Am. Medical Informatics Assoc..

[6]  Tom R. Gaunt,et al.  MELODI: Mining Enriched Literature Objects to Derive Intermediates , 2017, bioRxiv.

[7]  Tom R. Gaunt,et al.  MELODI Presto: a fast and agile tool to explore semantic triples derived from biomedical literature , 2020, Bioinform..

[8]  T. Klein,et al.  CPIC: Clinical Pharmacogenetics Implementation Consortium of the Pharmacogenomics Research Network , 2011, Clinical pharmacology and therapeutics.

[9]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[10]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[11]  P. Aukrust,et al.  Sun exposure induces rapid immunological changes in skin and peripheral blood in patients with psoriasis , 2011, The British journal of dermatology.

[12]  Paul Workman,et al.  canSAR: update to the cancer translational research and drug discovery knowledgebase , 2018, Nucleic Acids Res..

[13]  Valeriia Haberland,et al.  The MR-Base platform supports systematic causal inference across the human phenome , 2018, eLife.

[14]  C. A. Rietveld,et al.  Pleiotropy-robust Mendelian Randomization , 2016, bioRxiv.

[15]  Valeriia Haberland,et al.  Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases , 2020, Nature genetics.

[16]  Helen E. Parkinson,et al.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) , 2016, Nucleic Acids Res..

[17]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[18]  Xiaoli Cheng,et al.  751 Safety, Tolerability, and Pharmacokinetics of PTG-200, an Oral GI-Restricted Peptide Antagonist of IL-23 Receptor, in Normal Healthy Volunteers , 2019, American Journal of Gastroenterology.

[19]  Daniel S. Himmelstein,et al.  Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes , 2014, bioRxiv.

[20]  David C. Wilson,et al.  Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease , 2016, Nature Genetics.

[21]  B. Pierce,et al.  Efficient Design for Mendelian Randomization Studies: Subsample and 2-Sample Instrumental Variable Estimators , 2013, American journal of epidemiology.

[22]  S. Ebrahim,et al.  'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? , 2003, International journal of epidemiology.

[23]  Tudor Groza,et al.  The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species , 2016, bioRxiv.

[24]  Gautier Koscielny,et al.  Open Targets: a platform for therapeutic target identification and validation , 2016, Nucleic Acids Res..

[25]  John P. Overington,et al.  The druggable genome and support for target identification and validation in drug development , 2016, Science Translational Medicine.

[26]  Walter C Willett,et al.  Sweetened Beverage Consumption, Incident Coronary Heart Disease, and Biomarkers of Risk in Men , 2012, Circulation.

[27]  Gautier Koscielny,et al.  Open Targets Platform: new developments and updates two years on , 2018, Nucleic Acids Res..

[28]  Francoise Clavel-Chapelon,et al.  High Residential Sun Exposure Is Associated With a Low Risk of Incident Crohn's Disease in the Prospective E3N Cohort , 2014, Inflammatory bowel diseases.

[29]  Halil Kilicoglu,et al.  SemMedDB: a PubMed-scale repository of biomedical semantic predications , 2012, Bioinform..

[30]  G. Davey Smith,et al.  An atlas of polygenic risk score associations to highlight putative causal relationships across the human phenome , 2018, bioRxiv.

[31]  Yoon Tae Jeen,et al.  Anti-integrin therapy for inflammatory bowel disease , 2018, World journal of gastroenterology.

[32]  S. Purcell,et al.  Pleiotropy in complex traits: challenges and strategies , 2013, Nature Reviews Genetics.

[33]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..

[34]  D. Creamer,et al.  Altered vascular endothelium integrin expression in psoriasis. , 1995, The American journal of pathology.

[35]  Thomas Shafee,et al.  Wikidata as a knowledge graph for the life sciences , 2020, eLife.

[36]  Dexter Hadley,et al.  Systematic integration of biomedical knowledge prioritizes drugs for repurposing , 2017, bioRxiv.