Re-curation and rational enrichment of knowledge graphs in Biological Expression Language

Abstract The rapid accumulation of new biomedical literature not only causes curated knowledge graphs (KGs) to become outdated and incomplete, but also makes manual curation an impractical and unsustainable solution. Automated or semi-automated workflows are necessary to assist in prioritizing and curating the literature to update and enrich KGs. We have developed two workflows: one for re-curating a given KG to assure its syntactic and semantic quality and another for rationally enriching it by manually revising automatically extracted relations for nodes with low information density. We applied these workflows to the KGs encoded in Biological Expression Language from the NeuroMMSig database using content that was pre-extracted from MEDLINE abstracts and PubMed Central full-text articles using text mining output integrated by INDRA. We have made this workflow freely available at https://github.com/bel-enrichment/bel-enrichment.

[1]  Charles Auffray,et al.  Navigating the disease landscape: knowledge representations for contextualizing molecular signatures , 2018, Briefings Bioinform..

[2]  Martin Hofmann-Apitius,et al.  Reasoning over genetic variance information in cause-and-effect models of neurodegenerative diseases , 2015, Briefings Bioinform..

[3]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[4]  Crina Grosan,et al.  Computational models for inferring biochemical networks , 2014, Neural Computing and Applications.

[5]  Martin Hofmann-Apitius,et al.  Computational Modelling Approaches on Epigenetic Factors in Neurodegenerative and Autoimmune Diseases and Their Mechanistic Analysis , 2015, Journal of immunology research.

[6]  Benjamin M. Gyori,et al.  FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining , 2018, bioRxiv.

[7]  Martin Hofmann-Apitius,et al.  A systematic approach for identifying shared mechanisms in epilepsy and its comorbidities , 2018, bioRxiv.

[8]  Juliane Fluck,et al.  The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track , 2016, Database J. Biol. Databases Curation.

[9]  Nicolas Le Novère,et al.  MIRIAM Resources: tools to generate and resolve robust cross-references in Systems Biology , 2007, BMC Systems Biology.

[10]  Raul Rodriguez-Esteban,et al.  Biocuration with insufficient resources and fixed timelines , 2015, Database J. Biol. Databases Curation.

[11]  I. Weinstein,et al.  HINT1 inhibits β‐catenin/TCF4, USF2 and NFκB activity in human hepatoma cells , 2009, International journal of cancer.

[12]  Tapio Salakoski,et al.  Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations , 2012, Adv. Bioinformatics.

[13]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[14]  Gary D Bader,et al.  BioPAX – A community standard for pathway data sharing , 2010, Nature Biotechnology.

[15]  F B ROGERS,et al.  Medical Subject Headings , 1948, Nature.

[16]  Juliane Fluck,et al.  Construction of biological networks from unstructured information based on a semi-automated curation workflow , 2015, Database J. Biol. Databases Curation.

[17]  Gary D. Bader,et al.  Pathway Commons, a web resource for biological pathway data , 2010, Nucleic Acids Res..

[18]  J. Paul,et al.  Eicosapentaenoic acid membrane incorporation impairs ABCA1-dependent cholesterol efflux via a protein kinase A signaling pathway in primary human macrophages. , 2016, Biochimica et biophysica acta.

[19]  Martin Hofmann-Apitius,et al.  PathMe: Merging and exploring mechanistic pathway knowledge , 2019, BMC Bioinform..

[20]  Yue Liu,et al.  CLO: The cell line ontology , 2014, Journal of Biomedical Semantics.

[21]  Martin Hofmann-Apitius,et al.  Bioinformatics Mining and Modeling Methods for the Identification of Disease Mechanisms in Neurodegenerative Disorders , 2015, International journal of molecular sciences.

[22]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[23]  Hiroyuki Kubota,et al.  Trans-Omics: How To Reconstruct Biochemical Networks Across Multiple 'Omic' Layers. , 2016, Trends in biotechnology.

[24]  Etienne Birmelé,et al.  A model for gene deregulation detection using expression data , 2015, BMC Systems Biology.

[25]  Gary D Bader,et al.  NetPath: a public resource of curated signal transduction pathways , 2010, Genome Biology.

[26]  Tudor Groza,et al.  Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources , 2018, Nucleic Acids Res..

[27]  Charles Tapley Hoyt,et al.  PyBEL: a computational framework for Biological Expression Language , 2017, Bioinform..

[28]  Alessandro Piergentili,et al.  Cross-talk between alpha1D-adrenoceptors and transient receptor potential vanilloid type 1 triggers prostate cancer cell proliferation , 2014, BMC Cancer.

[29]  Juliane Fluck,et al.  BioCreative V track 4: a shared task for the extraction of causal network information using the Biological Expression Language , 2016, Database J. Biol. Databases Curation.

[30]  Clayton T. Morrison,et al.  Large-scale automated machine reading discovers new cancer-driving mechanisms , 2018, Database J. Biol. Databases Curation.

[31]  Mohammad Asif Emon,et al.  Using Drugs as Molecular Probes: A Computational Chemical Biology Approach in Neurodegenerative Diseases , 2016, Journal of Alzheimer's disease : JAD.

[32]  Anna Guryanova,et al.  sbv IMPROVER: Modern Approach to Systems Biology. , 2017, Methods in molecular biology.

[33]  Roland Eils,et al.  BioModels: expanding horizons to include more modelling approaches and formats , 2017, Nucleic Acids Res..

[34]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[35]  Jing Chen,et al.  NDEx, the Network Data Exchange. , 2015, Cell systems.

[36]  Emmanouil Athanasiadis,et al.  KENeV: A web-application for the automated reconstruction and visualization of the enriched metabolic and signaling super-pathways deriving from genomic experiments , 2015, Computational and structural biotechnology journal.

[37]  Alfio Gliozzo,et al.  Towards Comprehensive Noise Detection in Automatically Created Knowledge Graphs , 2017, International Semantic Web Conference.

[38]  Natalie L. Catlett,et al.  Reverse causal reasoning: applying qualitative causal knowledge to the interpretation of high-throughput data , 2013, BMC Bioinformatics.

[39]  Benjamin M. Gyori,et al.  From word models to executable models of signaling networks using automated assembly , 2017, bioRxiv.

[40]  David S. Wishart,et al.  Pathways with PathWhiz , 2015, Nucleic Acids Res..

[41]  Henning Hermjakob,et al.  The complex portal - an encyclopaedia of macromolecular complexes , 2014, Nucleic Acids Res..

[42]  Martin Kuiper,et al.  The gastrin and cholecystokinin receptors mediated signaling network: a scaffold for data analysis and new hypotheses on regulatory mechanisms , 2015, BMC Systems Biology.

[43]  Mauro Piacentini,et al.  Bak: a downstream mediator of fenretinide-induced apoptosis of SH-SY5Y neuroblastoma cells. , 2003, Cancer research.

[44]  Janusz Blasiak,et al.  BCR/ABL inhibits mismatch repair to protect from apoptosis and induce point mutations. , 2008, Cancer research.

[45]  David D. McDonald Issues in the representation of real texts: the design of KRISP , 2000 .

[46]  K. Bretonnel Cohen,et al.  A critical review of PASBio's argument structures for biomedical verbs , 2006, BMC Bioinformatics.

[47]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[48]  L. Getoor,et al.  Sparsity and Noise: Where Knowledge Graph Embeddings Fall Short , 2017, EMNLP.

[49]  Martin Romacker,et al.  Evolving BioAssay Ontology (BAO): modularization, integration and applications , 2014, Journal of Biomedical Semantics.

[50]  Martin Hofmann-Apitius,et al.  Comorbidity Analysis between Alzheimer’s Disease and Type 2 Diabetes Mellitus (T2DM) Based on Shared Pathways and the Role of T2DM Drugs , 2017, Journal of Alzheimer's disease : JAD.

[51]  Michelle Giglio,et al.  Human Disease Ontology 2018 update: classification, content and workflow expansion , 2018, Nucleic Acids Res..

[52]  Paul Young,et al.  LNX1 is a perisynaptic Schwann cell specific E3 ubiquitin ligase that interacts with ErbB2 , 2005, Molecular and Cellular Neuroscience.

[53]  Lincoln D. Stein,et al.  Impact of outdated gene annotations on pathway enrichment analysis , 2016, Nature Methods.

[54]  Ruud H. Brakenhoff,et al.  Rscreenorm: normalization of CRISPR and siRNA screen data for more reproducible hit selection , 2018, BMC Bioinformatics.

[55]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[56]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[57]  David S. Wishart,et al.  HMDB 4.0: the human metabolome database for 2018 , 2017, Nucleic Acids Res..

[58]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[59]  Ryan Miller,et al.  WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research , 2017, Nucleic Acids Res..

[60]  Ralf Herwig,et al.  The ConsensusPathDB interaction database: 2013 update , 2012, Nucleic Acids Res..

[61]  Rolf Apweiler,et al.  The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries , 2006, BMC Bioinformatics.

[62]  Mihai Surdeanu,et al.  A Domain-independent Rule-based Framework for Event Extraction , 2015, ACL.

[63]  Martin Hofmann-Apitius,et al.  BEL Commons: an environment for exploration and analysis of networks encoded in Biological Expression Language , 2018, bioRxiv.

[64]  Nicolas Le Novère,et al.  Identifiers.org and MIRIAM Registry: community resources to provide persistent identification , 2011, Nucleic Acids Res..

[65]  Ted Slater,et al.  Recent advances in modeling languages for pathway maps and computable biological networks. , 2014, Drug discovery today.

[66]  Susan Tweedie,et al.  Genenames.org: the HGNC and VGNC resources in 2017 , 2016, Nucleic Acids Res..

[67]  Barend Mons,et al.  Open PHACTS: semantic interoperability for drug discovery. , 2012, Drug discovery today.

[68]  Martin Hofmann-Apitius,et al.  Computable cause-and-effect models of healthy and Alzheimer's disease states and their mechanistic differential analysis , 2015, Alzheimer's & Dementia.

[69]  Martin J. O'Connor,et al.  The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata that Describe Scientific Experiments , 2017, SEMWEB.

[70]  Kei-Hoi Cheung,et al.  Erratum: The BioPAX community standard for pathway data sharing (Nat. Biotechnol. (2010) 28 (935-942) , 2010 .

[71]  Pier Luigi Lopalco,et al.  Are the Two Human Papillomavirus Vaccines Really Similar? A Systematic Review of Available Evidence: Efficacy of the Two Vaccines against HPV , 2015, Journal of immunology research.

[72]  Mohammad Asif Emon,et al.  Multimodal mechanistic signatures for neurodegenerative diseases (NeuroMMSig): a web server for mechanism enrichment , 2017, Bioinform..

[73]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..

[74]  Livia Perfetto,et al.  SIGNOR: a database of causal relationships between biological entities , 2015, Nucleic Acids Res..