Structured reviews for data and knowledge-driven research

Motivation Hypothesis generation is a critical step in research and a cornerstone in the rare disease field. Research is most efficient when those hypotheses are based on the entirety of knowledge known to date. Systematic review articles are commonly used in biomedicine to summarize existing knowledge and contextualize experimental data. But the information contained within review articles is typically only expressed as free-text, which is difficult to use computationally. Researchers struggle to navigate, collect and remix prior knowledge as it is scattered in several silos without seamless integration and access. This lack of a structured information framework hinders research by both experimental and computational scientists. Results To better organize knowledge and data, we built a structured review article that is specifically focused on NGLY1 Deficiency, an ultra-rare genetic disease first reported in 2012. We represented this structured review as a knowledge graph, and then stored this knowledge graph in a Neo4j database to simplify dissemination, querying, and visualization of the network. Relative to free-text, this structured review better promotes the principles of findability, accessibility, interoperability, and reusability (FAIR). In collaboration with domain experts in NGLY1 Deficiency, we demonstrate how this resource can improve the efficiency and comprehensiveness of hypothesis generation. We also developed a read-write interface that allows domain experts to contribute FAIR structured knowledge to this community resource. In contrast to traditional free-text review articles, this structured review exists as a living knowledge graph that is curated by humans and accessible to computational analyses. Finally, we have generalized this workflow into modular and repurposable components that can be applied to other domain areas. This NGLY1 Deficiency-focused network is publicly available at http://ngly1graph.org/. Availability and implementation Source code and network data files are at: https://github.com/SuLab/ngly1-graph and https://github.com/SuLab/bioknowledge-reviewer. Contact asu@scripps.edu

[1]  Lawrence Hunter,et al.  Biomedical Discovery Acceleration, with Applications to Craniofacial Development , 2009, PLoS Comput. Biol..

[2]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[3]  James C. Hu,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2019 .

[4]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[5]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[6]  Yong Jiang,et al.  Endothelial Aquaporin-1 (AQP1) Expression Is Regulated by Transcription Factor Mef2c , 2016, Molecules and cells.

[7]  The Gene Ontology Consortium,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2018, Nucleic Acids Res..

[8]  Michael Q. Zhang,et al.  TRED: a transcriptional regulatory element database, new entries and other development , 2007, Nucleic Acids Res..

[9]  Michael P. Snyder,et al.  Mutations in NGLY1 cause an inherited disorder of the endoplasmic reticulum–associated degradation pathway , 2014, Genetics in Medicine.

[10]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[11]  Barbara Zdrazil,et al.  The Application of the Open Pharmacological Concepts Triple Store (Open PHACTS) to Support Drug Discovery Research , 2014, PloS one.

[12]  Shane J. Neph,et al.  Circuitry and Dynamics of Human Transcription Factor Regulatory Networks , 2012, Cell.

[13]  Kara Dolinski,et al.  The BioGRID interaction database: 2019 update , 2018, Nucleic Acids Res..

[14]  Obi L. Griffith,et al.  High-performance web services for querying gene and variant annotation , 2016, Genome Biology.

[15]  Carolyn R. Bertozzi,et al.  Inhibition of NGLY1 Inactivates the Transcription Factor Nrf1 and Potentiates Proteasome Inhibitor Cytotoxicity , 2017, ACS central science.

[16]  F. Kashanchi,et al.  Pax-6 interactions with TATA-box-binding protein and retinoblastoma protein. , 1999, Investigative ophthalmology & visual science.

[17]  M. Mattson,et al.  Evidence for the involvement of TNF and NF‐κB in hippocampal synaptic plasticity , 2000, Synapse.

[18]  Christopher J. Rawlings,et al.  Towards FAIRer Biological Knowledge Networks Using a Hybrid Linked Data and Graph Database Approach , 2018, J. Integr. Bioinform..

[19]  David Baltimore,et al.  Erratum: NF-κB functions in synaptic signaling and behavior (Nature Neuroscince (2003) 6 (1072-1078)) , 2003 .

[20]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[21]  Christoph Steinbeck,et al.  ChEBI in 2016: Improved services and an expanding collection of metabolites , 2015, Nucleic Acids Res..

[22]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[24]  Matthew Might,et al.  Transcriptome and functional analysis in a Drosophila model of NGLY1 deficiency provides insight into therapeutic approaches , 2018, Human molecular genetics.

[25]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[26]  Gang Fu,et al.  Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data , 2014, Nucleic Acids Res..

[27]  George P Chrousos,et al.  Dynamic aberrant NF-κB spurs tumorigenesis: a new model encompassing the microenvironment. , 2015, Cytokine & growth factor reviews.

[28]  Gary Ruvkun,et al.  Proteasome dysfunction triggers activation of SKN-1A/Nrf1 by the aspartic protease DDI-1 , 2016, eLife.

[29]  Ellen F. Macnamara,et al.  Prospective phenotyping of NGLY1-CDDG, the first congenital disorder of deglycosylation , 2016, Genetics in Medicine.

[30]  Michel Dumontier,et al.  HyQue: evaluating hypotheses using Semantic Web technologies , 2011, J. Biomed. Semant..

[31]  Christopher J. Rawlings,et al.  Representing and querying disease networks using graph databases , 2016, BioData Mining.

[32]  Jung Eun Shim,et al.  TRRUST: a reference database of human transcriptional regulatory interactions , 2015, Scientific Reports.

[33]  Benjamin M. Good,et al.  Wikidata as a semantic framework for the Gene Wiki initiative , 2015, bioRxiv.

[34]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[35]  Dejan Dinevski,et al.  Biomedical question answering using semantic relations , 2015, BMC Bioinformatics.

[36]  Francis Simon,et al.  Hepatitis C virus NS5A protein binds TBP and p53, inhibiting their DNA binding and p53 interactions with TBP and ERCC3. , 2002, Biochimica et biophysica acta.

[37]  Trevor Cohen,et al.  EpiphaNet: An Interactive Tool to Support Biomedical Discoveries , 2010, Journal of biomedical discovery and collaboration.

[38]  Antonino Fiannaca,et al.  BioGraph: a web application and a graph database for querying and analyzing bioinformatics resources , 2018, BMC Systems Biology.

[39]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[40]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[41]  Chunlei Wu,et al.  BioGPS and MyGene.info: organizing online, gene-centric information , 2012, Nucleic Acids Res..

[42]  Gary Ruvkun,et al.  Protein Sequence Editing of SKN-1A/Nrf1 by Peptide:N-Glycanase Controls Proteasome Gene Expression , 2019, Cell.

[43]  Tudor Groza,et al.  The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species , 2016, bioRxiv.

[44]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[45]  Andrew M. Jenkinson,et al.  The EBI RDF platform: linked open data for the life sciences , 2014, Bioinform..

[46]  Tari Turner,et al.  Living Systematic Reviews: An Emerging Opportunity to Narrow the Evidence-Practice Gap , 2014, PLoS medicine.

[47]  A. Liekens,et al.  BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation , 2011, Genome Biology.

[48]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[49]  Tudor Groza,et al.  Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources , 2018, Nucleic Acids Res..

[50]  Janan T Eppig,et al.  The mammalian phenotype ontology: enabling robust annotation and comparative analysis , 2009, Wiley interdisciplinary reviews. Systems biology and medicine.

[51]  M Westerfield,et al.  Spatial Attention Deficits in Patients with Acquired or Developmental Cerebellar Abnormality , 1999, The Journal of Neuroscience.

[52]  Toshihisa Takagi,et al.  NBDC RDF portal: a comprehensive repository for semantic data in life sciences , 2018, Database J. Biol. Databases Curation.

[53]  David Baltimore,et al.  NF-κB functions in synaptic signaling and behavior , 2003, Nature Neuroscience.

[54]  G. Zambetti,et al.  Wild-type p53 binds to the TATA-binding protein and represses transcription. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Farhad Shokraneh,et al.  Reproducibility and replicability of systematic reviews , 2019, World Journal of Meta-Analysis.

[56]  Lawrence Hunter,et al.  KaBOB: ontology-based semantic integration of biomedical databases , 2015, BMC Bioinformatics.

[57]  Kelly Schoch,et al.  Clinical application of exome sequencing in undiagnosed genetic conditions , 2012, Journal of Medical Genetics.

[58]  A. Scully,et al.  Transactivation by the human cytomegalovirus IE2 86-kilodalton protein requires a domain that binds to both the TATA box-binding protein and the retinoblastoma protein , 1994, Journal of virology.

[59]  S. Vlahopoulos,et al.  Aberrant control of NF-κB in cancer permits transcriptional and phenotypic plasticity, to curtail dependence on host tissue: molecular mode , 2017, Cancer biology & medicine.

[60]  Caroline A Heckman,et al.  Negative regulation of bcl-2 expression by p53 in hematopoietic cells , 2001, Oncogene.

[61]  Halil Kilicoglu,et al.  SemMedDB: a PubMed-scale repository of biomedical semantic predications , 2012, Bioinform..