Publishing DisGeNET as Nanopublications

The increasing and unprecedented publication rate in the biomedical field is a major bottleneck for knowledge discov- ery in the Life Sciences. The manual curation of facts from published scientific papers is slow and inefficient, and therefore new approaches are needed that can enable the automatic, scalable and reliable extraction of assertions. While the publication of sci- entific assertions and datasets on the Semantic Web is gaining traction, it also creates new challenges such as the proper represen- tation of provenance and versioning. Here, we address these issues and describe our efforts to represent the DisGeNET database of human gene-disease associations as permanent, immutable, and provenance rich digital objects called nanopublications. Our nanopublications are the first instance of a Linked Data model that ensures stable interlinking of the assertion and its metadata by Trusty URIs. As DisGeNET integrates manually curated as well as text-mined data of different origins, the semantic description of the evidence for each assertion is important to provide trust and allow evidence-based hypothesis generation. Here, we describe our steps to ensure high quality and demonstrate the utility of linking our data to other datasets on the emerging Semantic Web.

[1]  Barend Mons,et al.  Converting neXtProt into Linked Data and nanopublications , 2015, Semantic Web.

[2]  D. Rebholz-Schuhmann,et al.  Text-mining solutions for biomedical research: enabling integrative biology , 2012, Nature Reviews Genetics.

[3]  Andrea Splendiani,et al.  Semantic Web Applications and Tools for Life Sciences, 2008 – Introduction , 2009, BMC Bioinformatics.

[4]  Michel Dumontier,et al.  Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data , 2014, ESWC.

[5]  A. Bauer-Mehren,et al.  Gene-Disease Network Analysis Reveals Functional Modules in Mendelian, Complex and Environmental Diseases , 2011, PloS one.

[6]  Ian M. Donaldson,et al.  iRefIndex: A consolidated protein interaction database with provenance , 2008, BMC Bioinformatics.

[7]  Núria Queralt-Rosinach,et al.  DisGeNET RDF: A Gene-Disease Association Linked Open Data Resource , 2013, SWAT4LS.

[8]  Benjamin M. Good,et al.  Building a biomedical semantic network in Wikipedia with Semantic Wiki Links , 2012, Database J. Biol. Databases Curation.

[9]  Anthony J. Brookes,et al.  Semantically enabling a genome-wide association study database , 2012, Journal of Biomedical Semantics.

[10]  Núria Queralt-Rosinach,et al.  The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery , 2014, J. Biomed. Semant..

[11]  Mikel Egaña Aranguren,et al.  Nanopublishing Clinical Diagnoses: Tracking Diagnostic Knowledge Base Content and Utilization , 2014, 2014 IEEE 27th International Symposium on Computer-Based Medical Systems.

[12]  Michel Dumontier,et al.  Bio2RDF Release 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data , 2013, ESWC.

[13]  Núria Queralt-Rosinach,et al.  DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes , 2015, Database J. Biol. Databases Curation.

[14]  Luca de Alfaro,et al.  The Gene Wiki in 2011: community intelligence applied to human gene annotation , 2011, Nucleic Acids Res..

[15]  Laura Inés Furlong,et al.  DisGeNET: a Cytoscape plugin to visualize, integrate, search and analyze gene-disease networks , 2010, Bioinform..

[16]  Stian Soiland-Reyes,et al.  PAV ontology: provenance, authoring and versioning , 2013, J. Biomed. Semant..

[17]  Oscar Corcho,et al.  Workflow-centric research objects: First class citizens in scholarly discourse. , 2012, ESWC 2012.

[18]  Erik Schultes,et al.  Nanopublications for exposing experimental data in the life-sciences: a Huntington’s Disease case study , 2015, Journal of Biomedical Semantics.

[19]  Paul T. Groth,et al.  Querying neXtProt nanopublications and their value for insights on sequence variants and tissue expression , 2014, J. Web Semant..

[20]  José Luís Oliveira,et al.  Exploring nanopublications integration in pharmacovigilance scenarios , 2013, 2013 IEEE 15th International Conference on e-Health Networking, Applications and Services (Healthcom 2013).

[21]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[22]  Núria Queralt-Rosinach,et al.  Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research , 2014, BMC Bioinformatics.

[23]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[24]  Nicolas Le Novère,et al.  Towards the Collaborative Curation of the Registry underlying identifiers.org , 2013, Database J. Biol. Databases Curation.

[25]  Chris T. A. Evelo,et al.  WikiPathways: building research communities on biological pathways , 2011, Nucleic Acids Res..

[26]  Chris T. A. Evelo,et al.  Applying linked data approaches to pharmacology: Architectural decisions and implementation , 2014, Semantic Web.