From a Conceptual Model to a Knowledge Graph for Genomic Datasets

Data access at genomic repositories is problematic, as data is described by heterogeneous and hardly comparable metadata. We previously introduced a unified conceptual schema, collected metadata in a single repository and provided classical search methods upon them. We here propose a new paradigm to support semantic search of integrated genomic metadata, based on the Genomic Knowledge Graph, a semantic graph of genomic terms and concepts, which combines the original information provided by each source with curated terminological content from specialized ontologies.

[1]  Pelin Yilmaz,et al.  Meta-omics data and collection objects (MOD-CO): a conceptual schema and data model for processing sample data in meta-omics research , 2019, Database J. Biol. Databases Curation.

[2]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[3]  Markus Schneider,et al.  The GenAlg project: developing a new integrating data model, language, and tool for managing and querying genomic information , 2004, SGMD.

[4]  Simon Jupp,et al.  A new Ontology Lookup Service at EMBL-EBI , 2015, SWAT4LS.

[5]  Oscar Pastor,et al.  A Method to Identify Relevant Genome Data: Conceptual Modeling for the Medicine of Precision , 2018, ER.

[6]  Oscar Pastor,et al.  Applying the Principles of an Ontology-Based Approach to a Conceptual Schema of Human Genome , 2013, ER.

[7]  Alfredo Pulvirenti,et al.  Comprehensive Reconstruction and Visualization of Non-Coding Regulatory Networks in Human , 2014, Front. Bioeng. Biotechnol..

[8]  Alessandro Campi,et al.  Conceptual Modeling for Genomics: Building an Integrated Repository of Open Data , 2017, ER.

[9]  Oscar Pastor,et al.  Applying Conceptual Modeling to Better Understand the Human Genome , 2016, ER.

[10]  Stefano Ceri,et al.  Ontology-driven metadata enrichment for genomic datasets , 2018, SWAT4LS.

[11]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[12]  Antonino Fiannaca,et al.  BioGraph: a web application and a graph database for querying and analyzing bioinformatics resources , 2018, BMC Systems Biology.

[13]  Pablo Pareja-Tobes,et al.  Bio4j: a high-performance cloud-enabled graph-based data platform , 2015, bioRxiv.

[14]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[15]  Marco Masseroli,et al.  Processing of big heterogeneous genomic datasets for tertiary analysis of Next Generation Sequencing data , 2018, Bioinform..

[16]  L. Staudt,et al.  The NCI Genomic Data Commons as an engine for precision medicine. , 2017, Blood.

[17]  Martin Renqiang Min,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .