Graph Databases in Molecular Biology

In recent years, the increase in the amount of data generated in basic social practices and specifically in all fields of research has boosted the rise of new database models, many of which have been employed in the field of Molecular Biology. NoSQL graph databases have been used in many types of research with biological data, especially in cases where data integration is a determining factor. For the most part, they are used to represent relationships between data along two main lines: (i) to infer knowledge from existing relationships; (ii) to represent relationships from a previous data knowledge. In this work, a short history in a timeline of events introduces the mutual evolution of databases and Molecular Biology. We present how graph databases have been used in Molecular Biology research using High Throughput Sequencing data, and discuss their role and the open field of research in this area.

[1]  Guillermo Durán,et al.  Performance of epistasis detection methods in semi-simulated GWAS , 2018, BMC Bioinformatics.

[2]  Edgar H. Sibley,et al.  Evolution of Data-Base Management Systems , 1976, CSUR.

[3]  F. Sanger,et al.  A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. , 1975, Journal of molecular biology.

[4]  James Shreeve,et al.  The Genome War: How Craig Venter Tried to Capture the Code of Life and Save the World , 2004 .

[5]  Sophia Ananiadou,et al.  biochem4j: Integrated and extensible biochemical knowledge through graph databases , 2017, PloS one.

[6]  Maristela Holanda,et al.  GRAPHED: A Graph Description Diagram for Graph Databases , 2018, WorldCIST.

[7]  Alessandra Carbone,et al.  Meet-U: Educating through research immersion , 2018, PLoS Comput. Biol..

[8]  Alfredo Pulvirenti,et al.  Comprehensive Reconstruction and Visualization of Non-Coding Regulatory Networks in Human , 2014, Front. Bioeng. Biotechnol..

[9]  J. H. Matthaei,et al.  Ribonucleotide composition of the genetic code. , 1962, Biochemical and biophysical research communications.

[10]  Charles W. Bachman,et al.  The Origin of the Integrated Data Store (IDS): The First Direct-Access DBMS , 2009, IEEE Annals of the History of Computing.

[11]  Pablo Pareja-Tobes,et al.  Bio4j: a high-performance cloud-enabled graph-based data platform , 2015, bioRxiv.

[12]  Henning Hermjakob,et al.  Reactome graph database: Efficient access to complex pathway data , 2018, PLoS Comput. Biol..

[13]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[14]  Szymon Klarman,et al.  BioGrakn: A Knowledge Graph-Based Semantic Database for Biomedical Sciences , 2017, CISIS.

[15]  S. Brenner,et al.  General Nature of the Genetic Code for Proteins , 1961, Nature.

[16]  Joseph T O'Neill,et al.  MUMPS language standard , 1976 .

[17]  S. M. Deen Fundamentals of Data Base Systems , 1977, Macmillan Computer Science Series.

[18]  F. Crick,et al.  Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid , 1953, Nature.

[19]  Tim Berners-Lee,et al.  World-Wide Web: The Information Universe , 1992, Electron. Netw. Res. Appl. Policy.

[20]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[21]  Thomas Kelder,et al.  The Network Library: a framework to rapidly integrate network biology resources , 2016, Bioinform..

[22]  Christopher J. Rawlings,et al.  Representing and querying disease networks using graph databases , 2016, BioData Mining.

[23]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[24]  Domenica D'Elia,et al.  Arena-Idb: a platform to build human non-coding RNA interaction networks , 2018, BMC Bioinformatics.

[25]  Maria Emilia Telles Walter,et al.  A terpenoid metabolic network modelled as graph database , 2017, Int. J. Data Min. Bioinform..

[26]  Damian Szklarczyk,et al.  The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible , 2016, Nucleic Acids Res..

[27]  Lars Juhl Jensen,et al.  Are graph databases ready for bioinformatics? , 2013, Bioinform..

[28]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[29]  Keiichiro Ono,et al.  cyNeo4j: connecting Neo4j and Cytoscape , 2015, Bioinform..

[30]  Félix Romojaro,et al.  Transcriptomic Events Involved in Melon Mature-Fruit Abscission Comprise the Sequential Induction of Cell-Wall Degrading Genes Coupled to a Stimulation of Endo and Exocytosis , 2013, PloS one.

[31]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[32]  Heather J. Ruskin,et al.  EpiGeNet: A Graph Database of Interdependencies Between Genetic and Epigenetic Events in Colorectal Cancer , 2017, J. Comput. Biol..

[33]  D McCallum,et al.  Computer processing of DNA sequence data. , 1977, Journal of molecular biology.

[34]  Jim Webber,et al.  Graph Databases: New Opportunities for Connected Data , 2013 .

[35]  Srinath Srinivasa,et al.  Data, Storage and Index Models for Graph Databases , 2011, Graph Data Management.

[36]  Josep-Lluís Larriba-Pey,et al.  Benchmarking database systems for social network applications , 2013, GRADES.

[37]  Fábio Porto,et al.  GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis , 2017, PeerJ.

[38]  Alejandro Zunino,et al.  Persisting big-data: The NoSQL landscape , 2017, Inf. Syst..

[39]  R. Wu,et al.  Nucleotide sequence analysis of DNA. II. Complete nucleotide sequence of the cohesive ends of bacteriophage lambda DNA. , 1971, Journal of molecular biology.

[40]  F. Crick,et al.  A structure for deoxyribose nucleic acid , 2017 .

[41]  C. Hutchison DNA sequencing: bench to bedside and beyond , 2007, Nucleic acids research.

[42]  Olaf Wolkenhauer,et al.  Combining computational models, semantic annotations and simulation experiments in a graph database , 2015, Database J. Biol. Databases Curation.