Representing and querying disease networks using graph databases

BackgroundSystems biology experiments generate large volumes of data of multiple modalities and this information presents a challenge for integration due to a mix of complexity together with rich semantics. Here, we describe how graph databases provide a powerful framework for storage, querying and envisioning of biological data.ResultsWe show how graph databases are well suited for the representation of biological information, which is typically highly connected, semi-structured and unpredictable. We outline an application case that uses the Neo4j graph database for building and querying a prototype network to provide biological context to asthma related genes.ConclusionsOur study suggests that graph databases provide a flexible solution for the integration of multiple types of biological data and facilitate exploratory data mining to support hypothesis generation.

[1]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .

[2]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[3]  Christopher J. Rawlings,et al.  Data integration for plant genomics - exemplars from the integration of Arabidopsis thaliana databases , 2009, Briefings Bioinform..

[4]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[5]  Christina Backes,et al.  BN++ - A Biological Information System , 2006, J. Integr. Bioinform..

[6]  A. Bauer-Mehren,et al.  Gene-Disease Network Analysis Reveals Functional Modules in Mendelian, Complex and Environmental Diseases , 2011, PloS one.

[7]  Christopher G. Chute,et al.  BioPortal: ontologies and integrated data resources at the click of a mouse , 2009, Nucleic Acids Res..

[8]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[9]  Tohru Sakamoto,et al.  The search for common pathways underlying asthma and COPD , 2013, International journal of chronic obstructive pulmonary disease.

[10]  S. Muggleton,et al.  Gene Function Hypotheses for the Campylobacter jejuni Glycome Generated by a Logic-Based Approach , 2013, Journal of molecular biology.

[11]  Christopher J. Rawlings,et al.  WIBL: Workbench for Integrative Biological Learning , 2011, J. Integr. Bioinform..

[12]  E Meyerhoff,et al.  Communications to the Editor. , 1965, Bulletin of the Medical Library Association.

[13]  Ziv Bar-Joseph,et al.  Gene expression in relation to exhaled nitric oxide identifies novel asthma phenotypes with unique biomolecular pathways. , 2014, American journal of respiratory and critical care medicine.

[14]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[15]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[16]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[17]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[18]  Pablo Pareja-Tobes,et al.  Bio4j: a high-performance cloud-enabled graph-based data platform , 2015, bioRxiv.

[19]  Christopher J. Rawlings,et al.  Graph-based analysis and visualization of experimental results with ONDEX , 2006, Bioinform..

[20]  F B ROGERS,et al.  Medical Subject Headings , 1948, Nature.

[21]  P. Portoghese,et al.  Application of the message-address concept in the design of highly potent and selective non-peptide delta opioid receptor antagonists. , 1988, Journal of medicinal chemistry.

[22]  Ethan D Buhr,et al.  Molecular components of the Mammalian circadian clock. , 2013, Handbook of experimental pharmacology.

[23]  Leroy Hood,et al.  Systems Approaches to Biology and Disease Enable Translational Systems Medicine , 2012, Genom. Proteom. Bioinform..

[24]  Charles Auffray,et al.  Predictive, preventive, personalized and participatory medicine: back to the future , 2010, Genome Medicine.

[25]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[26]  Roger A. Côté Architecture of SNOMED , 1989 .

[27]  Hannu Toivonen,et al.  Biomine: predicting links between biological entities using network models of heterogeneous databases , 2012, BMC Bioinformatics.

[28]  Michel Dumontier,et al.  Bio2RDF Release 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data , 2013, ESWC.

[29]  David W Ray,et al.  The circadian clock and asthma , 2013, Thorax.

[30]  G. von Heijne,et al.  Tissue-based map of the human proteome , 2015, Science.

[31]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[32]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[33]  M. Gladwin,et al.  An Airway Epithelial iNOS-DUOX2-Thyroid Peroxidase Metabolome Drives Th1/Th2 Nitrative Stress in Human Severe Asthma , 2014, Mucosal Immunology.

[34]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[35]  Golan Yona,et al.  BIOZON: a system for unification, management and analysis of heterogeneous biological data , 2006, BMC Bioinformatics.

[36]  Tatiana A. Tatusova,et al.  RefSeq microbial genomes database: new representation and annotation strategy , 2013, Nucleic Acids Res..

[37]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[38]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008 .

[39]  Jessica A. Turner,et al.  Modeling biomedical experimental processes with OBI , 2010, J. Biomed. Semant..

[40]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[41]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[42]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.