Automated extraction of potential migraine biomarkers using a semantic graph

PROBLEM Biomedical literature and databases contain important clues for the identification of potential disease biomarkers. However, searching these enormous knowledge reservoirs and integrating findings across heterogeneous sources is costly and difficult. Here we demonstrate how semantically integrated knowledge, extracted from biomedical literature and structured databases, can be used to automatically identify potential migraine biomarkers. METHOD We used a knowledge graph containing more than 3.5 million biomedical concepts and 68.4 million relationships. Biochemical compound concepts were filtered and ranked by their potential as biomarkers based on their connections to a subgraph of migraine-related concepts. The ranked results were evaluated against the results of a systematic literature review that was performed manually by migraine researchers. Weight points were assigned to these reference compounds to indicate their relative importance. RESULTS Ranked results automatically generated by the knowledge graph were highly consistent with results from the manual literature review. Out of 222 reference compounds, 163 (73%) ranked in the top 2000, with 547 out of the 644 (85%) weight points assigned to the reference compounds. For reference compounds that were not in the top of the list, an extensive error analysis has been performed. When evaluating the overall performance, we obtained a ROC-AUC of 0.974. DISCUSSION Semantic knowledge graphs composed of information integrated from multiple and varying sources can assist researchers in identifying potential disease biomarkers.

[1]  Zhiyong Lu,et al.  PubMed and beyond: a survey of web tools for searching biomedical literature , 2011, Database J. Biol. Databases Curation.

[2]  F. Sanz,et al.  A Knowledge-Driven Approach to Extract Disease-Related Biomarkers from the Literature , 2014, BioMed research international.

[3]  Egon L. Willighagen,et al.  Linked open drug data for pharmaceutical research and development , 2011, J. Cheminformatics.

[4]  Michael Y. Galperin,et al.  The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection , 2011, Nucleic Acids Res..

[5]  Martijn J. Schuemie,et al.  Combination of Genetic Databases for Improving Identification of Genes and Proteins in Text , 2005 .

[6]  Sungji Choo,et al.  Context-based resolution of semantic conflicts in biological pathways , 2015, BMC Medical Informatics and Decision Making.

[7]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2013 , 2012, Nucleic Acids Res..

[8]  Erik M. van Mulligen,et al.  Knowledge-based extraction of adverse drug events from biomedical text , 2014, BMC Bioinformatics.

[9]  Pierre Zweigenbaum,et al.  Automatic extraction of semantic relations between medical entities: a rule based approach , 2011, J. Biomed. Semant..

[10]  Matthew E. Falagas,et al.  An analysis of factors contributing to PubMed's growth , 2015, J. Informetrics.

[11]  J. Dreier The role of spreading depression, spreading depolarization and spreading ischemia in neurological disease , 2011, Nature Medicine.

[12]  Halil Kilicoglu,et al.  SemMedDB: a PubMed-scale repository of biomedical semantic predications , 2012, Bioinform..

[13]  Rong Xu,et al.  Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature , 2015, BMC Bioinformatics.

[14]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2011 , 2010, Nucleic Acids Res..

[15]  K. Hampton,et al.  Elevation of Plasma Vasopressin in Spontaneous Migraine , 1991, Cephalalgia : an international journal of headache.

[16]  Wanda Pratt,et al.  A new evaluation methodology for literature-based discovery systems , 2009, J. Biomed. Informatics.

[17]  Gerhard Weikum,et al.  KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences , 2015, BMC Bioinformatics.

[18]  Martijn J. Schuemie,et al.  Distribution of information in biomedical abstracts and full-text publications , 2004, Bioinform..

[19]  J. Hintze,et al.  Violin plots : A box plot-density trace synergism , 1998 .

[20]  M. Ferrari,et al.  Molecular genetics of migraine , 2009, Human Genetics.

[21]  D. Swanson Migraine and Magnesium: Eleven Neglected Connections , 2015, Perspectives in biology and medicine.

[22]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[23]  Olivier Bodenreider,et al.  Aggregating UMLS Semantic Types for Reducing Conceptual Complexity , 2001, MedInfo.

[24]  P. Martínez-Camblor,et al.  Interictal increase of CGRP levels in peripheral blood as a biomarker for chronic migraine , 2013, Neurology.

[25]  F. Goodsaid Challenges of biomarkers in drug discovery and development , 2012, Expert opinion on drug discovery.

[26]  Paul T. Groth,et al.  The Semantic Web – ISWC 2014 , 2014, Lecture Notes in Computer Science.

[27]  Martin Hofmann-Apitius,et al.  NeuroRDF: semantic integration of highly curated data to prioritize biomarker candidates in Alzheimer's disease , 2016, Journal of Biomedical Semantics.

[28]  A. Liekens,et al.  BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation , 2011, Genome Biology.

[29]  Stefan Verhoeven,et al.  Identification of new biomarker candidates for glucocorticoid induced insulin resistance using literature mining , 2013, BioData Mining.

[30]  David Borsook,et al.  Migraine: Multiple Processes, Complex Pathophysiology , 2015, The Journal of Neuroscience.

[31]  Carlo A. Trugenberger,et al.  Discovery of novel biomarkers and phenotypes by semantic technologies , 2012, BMC Bioinformatics.

[32]  Thomas Hankemeier,et al.  Migraine biomarkers in cerebrospinal fluid: A systematic review and meta-analysis , 2017, Cephalalgia : an international journal of headache.

[33]  Sanda M. Harabagiu,et al.  Automatic extraction of relations between medical concepts in clinical texts , 2011, J. Am. Medical Informatics Assoc..

[34]  Paul T. Groth,et al.  The anatomy of a nanopublication , 2010, Inf. Serv. Use.

[35]  Mukesh Verma,et al.  Cancer Biomarkers: Are We Ready for the Prime Time? , 2010, Cancers.

[36]  M. Ferrari,et al.  Migraine and MTHFR C677T genotype in a population‐based sample , 2006, Annals of neurology.

[37]  Michael Y. Galperin,et al.  The 2015 Nucleic Acids Research Database Issue and Molecular Biology Database Collection , 2014, Nucleic Acids Res..

[38]  Michael Y. Galperin,et al.  The 2016 database issue of Nucleic Acids Research and an updated molecular biology database collection , 2015, Nucleic Acids Res..

[39]  Michael Y. Galperin,et al.  The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection , 2013, Nucleic Acids Res..

[40]  J. Ioannidis,et al.  The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. , 2009, Journal of clinical epidemiology.

[41]  Wen-Lian Hsu,et al.  LiverCancerMarkerRIF: a liver cancer biomarker interactive curation system combining text mining and expert annotations , 2014, Database J. Biol. Databases Curation.

[42]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[43]  P. Grant,et al.  Plasma Vasopressin Levels in Induced Migraine Attacks , 1987, Cephalalgia : an international journal of headache.

[44]  Mark Stevenson,et al.  Exploring relation types for literature-based discovery , 2015, J. Am. Medical Informatics Assoc..

[45]  Martin Hofmann-Apitius,et al.  Bioinformatics Mining and Modeling Methods for the Identification of Disease Mechanisms in Neurodegenerative Disorders , 2015, International journal of molecular sciences.

[46]  T. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2010, Nucleic Acids Res..

[47]  M. Moskowitz,et al.  Pathophysiology of Migraine , 2010, Seminars in neurology.

[48]  Martijn J. Schuemie,et al.  Peregrine: Lightweight gene name normalization by dictionary lookup , 2007 .

[49]  T. Bartsch,et al.  Involvement of Corticotrophin-Releasing Factor and Orexin-A in Chronic Migraine and Medication Overuse Headache: Findings From Cerebrospinal Fluid , 2008, Cephalalgia : an international journal of headache.

[50]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[51]  A. Calhoun,et al.  Migraine and estrogen. , 2014, Current opinion in neurology.

[52]  Egon L. Willighagen,et al.  Scientific Lenses to Support Multiple Views over Linked Chemistry Data , 2014, SEMWEB.

[53]  David S. Wishart,et al.  HMDB 3.0—The Human Metabolome Database in 2013 , 2012, Nucleic Acids Res..

[54]  N R Smalheiser,et al.  Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. , 1998, Computer methods and programs in biomedicine.

[55]  Amit P. Sheth,et al.  A graph-based recovery and decomposition of Swanson's hypothesis using semantic predications , 2013, J. Biomed. Informatics.

[56]  Jaana Kekäläinen,et al.  Cumulated gain-based indicators of IR performance , 2002 .

[57]  E. Loder,et al.  Biomarkers in Migraine: Their Promise, Problems, and Practical Applications , 2006, Headache.

[58]  Amit P. Sheth,et al.  Context-Driven Automatic Subgraph Creation for Literature-Based Discovery , 2015, J. Biomed. Informatics.

[59]  P. Calabresi,et al.  Involvement of Corticotrophin-Releasing Factor and Orexin-A in Chronic Migraine and Medication-Overuse Headache: Findings From Cerebrospinal Fluid , 2008, Cephalalgia : an international journal of headache.