NBDC RDF portal: a comprehensive repository for semantic data in life sciences

Abstract In the life sciences, researchers increasingly want to access multiple databases in an integrated way. However, different databases currently use different formats and vocabularies, hindering the proper integration of heterogeneous life science data. Adopting the Resource Description Framework (RDF) has the potential to address such issues by improving database interoperability, leading to advances in automatic data processing. Based on this idea, we have advised many Japanese database development groups to expose their databases in RDF. To further promote such activities, we have developed an RDF-based life science dataset repository called the National Bioscience Database Center (NBDC) RDF portal. All the datasets in this repository have been reviewed by the NBDC to ensure interoperability and queryability. As of July 2018, the service includes 21 RDF datasets, comprising over 45.5 billion triples. It provides SPARQL endpoints for all datasets, useful metadata and the ability to download RDF files. The NBDC RDF portal can be accessed at https://integbio.jp/rdf/.

[1]  Akira R. Kinjo,et al.  Protein Data Bank Japan (PDBj): updated user interfaces, resource description framework, analysis tools for large structures , 2016, Nucleic Acids Res..

[2]  Kenta Nakai,et al.  DBTSS/DBKERO for integrated analysis of transcriptional regulation , 2017, Nucleic Acids Res..

[3]  Osamu Ogasawara,et al.  RefEx, a reference gene expression dataset as a web tool for the functional analysis of genes , 2017, Scientific Data.

[4]  Fumikazu Konishi,et al.  The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies , 2013, J. Biomed. Semant..

[5]  Huajun Chen,et al.  Semantic Web meets Integrative Biology: a survey , 2013, Briefings Bioinform..

[6]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[7]  Nicolas Le Novère,et al.  Identifiers.org and MIRIAM Registry: community resources to provide persistent identification , 2011, Nucleic Acids Res..

[8]  Xosé M. Fernández-Suárez,et al.  The 2018 Nucleic Acids Research database issue and the online molecular biology database collection , 2017, Nucleic Acids Res..

[9]  Debra L. Fulton,et al.  The Transcription Factor Encyclopedia , 2012, Genome Biology.

[10]  Michael Y. Galperin,et al.  The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection , 2011, Nucleic Acids Res..

[11]  Hiroshi Yamada,et al.  Open TG-GATEs: a large-scale toxicogenomics database , 2014, Nucleic Acids Res..

[12]  Evan Bolton,et al.  PubChem3D: conformer ensemble accuracy , 2013, Journal of Cheminformatics.

[13]  Bin Chen,et al.  The ChEMBL database as linked open data , 2013, Journal of Cheminformatics.

[14]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[15]  Adi Doron-Faigenboim,et al.  FastML: a web server for probabilistic reconstruction of ancestral sequences , 2012, Nucleic Acids Res..

[16]  Akira R. Kinjo,et al.  The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium* , 2010, J. Biomed. Semant..

[17]  S. Lewis,et al.  Uberon, an integrative multi-species anatomy ontology , 2012, Genome Biology.

[18]  Martin Kuiper,et al.  Biological knowledge management: the emerging role of the Semantic Web technologies , 2009, Briefings Bioinform..

[19]  Toru Yanagawa,et al.  Measuring Integrated Information from the Decoding Perspective , 2015, PLoS Comput. Biol..

[20]  Akira R. Kinjo,et al.  The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications , 2011, J. Biomed. Semant..

[21]  Zhiyong Lu,et al.  The CHEMDNER corpus of chemicals and drugs and its annotation principles , 2015, Journal of Cheminformatics.

[22]  Michel Dumontier,et al.  FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation , 2014, Journal of Biomedical Semantics.

[23]  Damian Smedley,et al.  The influence of disease categories on gene candidate predictions from model organism phenotypes , 2014, Journal of Biomedical Semantics.

[24]  Toshihisa Takagi,et al.  The DDBJ Japanese Genotype-phenotype Archive for genetic and phenotypic human data , 2014, Nucleic Acids Res..

[25]  Carlos Alberto Heuser,et al.  Integrating Biological Databases , 2003, SBBD.

[26]  Andrew M. Jenkinson,et al.  The EBI RDF platform: linked open data for the life sciences , 2014, Bioinform..

[27]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[28]  K. Bretonnel Cohen,et al.  BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains , 2014, Journal of Biomedical Semantics.

[29]  Egon L. Willighagen,et al.  Emerging practices for mapping and linking life sciences data using RDF - A case series , 2012, J. Web Semant..

[30]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[31]  Toshihisa Takagi,et al.  DNA Data Bank of Japan , 2016, Nucleic Acids Res..

[32]  Ted Slater,et al.  Beyond data integration. , 2008, Drug discovery today.

[33]  Egon L. Willighagen,et al.  PubChemRDF: towards the semantic annotation of PubChem compound and substance databases , 2015, Journal of Cheminformatics.

[34]  Robert Stevens,et al.  Ten Simple Rules for Selecting a Bio-ontology , 2016, PLoS Comput. Biol..

[35]  Núria Queralt-Rosinach,et al.  The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery , 2014, J. Biomed. Semant..