Increasing Life Science Resources Re-Usability using Semantic Web Technologies

In life sciences, current standardization and integration efforts are directed towards reference data and knowledge bases. However, original studies results are generally provided in non standardized and specific formats. In addition, the only formalization of analysis pipelines is often limited to textual descriptions in the method sections. Both factors impair the results reproducibility, their maintenance and their reuse for advancing other studies. Semantic Web technologies have proven their efficiency for facilitating the integration and reuse of reference data and knowledge bases. We thus hypothesize that Semantic Web technologies also facilitate reproducibility and reuse of life sciences studies involving pipelines that compute associations between entities according to intermediary relations and dependencies. In order to assess this hypothesis, we considered a case-study in systems biology (http://regulatorycircuits.org), which provides tissue-specific regulatory interaction networks to elucidate perturbations across complex diseases. Our approach consisted in surveying the complete set of provided supplementary files to reveal the underlying structure between the biological entities described in the data. We relied on this structure and used Semantic Web technologies (i) to integrate the Regulatory Circuits data, and (ii) to formalize the analysis pipeline as SPARQL queries. Our result was a 335,429,988 triples dataset on which two SPARQL queries were sufficient to extract each single tissuespecific regulatory network.

[1]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[2]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[3]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[4]  Pietro Liò,et al.  The BioMart community portal: an innovative alternative to large, centralized data repositories , 2015, Nucleic Acids Res..

[5]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[6]  J. Michael Cherry,et al.  Ontology application and use at the ENCODE DCC , 2015, Database J. Biol. Databases Curation.

[7]  Carlos Alberto Heuser,et al.  Integrating Biological Databases , 2003, SBBD.

[8]  Christopher J. Rawlings,et al.  Towards FAIRer Biological Knowledge Networks Using a Hybrid Linked Data and Graph Database Approach , 2018, J. Integr. Bioinform..

[9]  Andrew M. Jenkinson,et al.  The EBI RDF platform: linked open data for the life sciences , 2014, Bioinform..

[10]  Derek W Wright,et al.  Gateways to the FANTOM5 promoter level mammalian expression atlas , 2015, Genome Biology.

[11]  David Z. Chen,et al.  Architecture of the human regulatory network derived from ENCODE data , 2012, Nature.

[12]  Daniel Marbach,et al.  Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases , 2016, Nature Methods.

[13]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[14]  Alejandro Rodríguez-González,et al.  Publishing FAIR Data: An Exemplar Methodology Utilizing PHI-Base , 2016, Front. Plant Sci..

[15]  Tim Berners-Lee,et al.  Publishing on the semantic web , 2001, Nature.

[16]  Sergio Contrino,et al.  InterMine: extensive web services for modern biology , 2014, Nucleic Acids Res..

[17]  Satrajit S. Ghosh,et al.  Sharing brain mapping statistical results with the neuroimaging data model , 2016, Scientific Data.

[18]  Barbara Marte,et al.  Presenting the Epigenome Roadmap , 2015, Nature.

[19]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[20]  Piero Carninci,et al.  FANTOM5 transcriptome catalog of cellular states based on Semantic MediaWiki , 2016, Database J. Biol. Databases Curation.

[21]  Karin M. Verspoor,et al.  Representing annotation compositionality and provenance for the Semantic Web , 2013, J. Biomed. Semant..

[22]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[23]  Michael Y. Galperin,et al.  The 2015 Nucleic Acids Research Database Issue and Molecular Biology Database Collection , 2014, Nucleic Acids Res..