OpenBiodiv: A Knowledge Graph for Literature-Extracted Linked Open Data in Biodiversity Science

Hundreds of years of biodiversity research have resulted in the accumulation of a substantial pool of communal knowledge; however, most of it is stored in silos isolated from each other, such as published articles or monographs. The need for a system to store and manage collective biodiversity knowledge in a community-agreed and interoperable open format has evolved into the concept of the Open Biodiversity Knowledge Management System (OBKMS). This paper presents OpenBiodiv: An OBKMS that utilizes semantic publishing workflows, text and data mining, common standards, ontology modelling and graph database technologies to establish a robust infrastructure for managing biodiversity knowledge. It is presented as a Linked Open Dataset generated from scientific literature. OpenBiodiv encompasses data extracted from more than 5000 scholarly articles published by Pensoft and many more taxonomic treatments extracted by Plazi from journals of other publishers. The data from both sources are converted to Resource Description Framework (RDF) and integrated in a graph database using the OpenBiodiv-O ontology and an RDF version of the Global Biodiversity Information Facility (GBIF) taxonomic backbone. Through the application of semantic technologies, the project showcases the value of open publishing of Findable, Accessible, Interoperable, Reusable (FAIR) data towards the establishment of open science practices in the biodiversity domain.

[1]  Donat Agosti,et al.  Taxonomic information exchange and copyright: the Plazi approach , 2009, BMC Research Notes.

[2]  W. John Kress,et al.  Semantic tagging of and semantic enhancements to systematics papers: ZooKeys working examples , 2010, ZooKeys.

[3]  Lyubomir Penev,et al.  The Open Biodiversity Knowledge Management System in Scholarly Publishing , 2016 .

[4]  Lyubomir Penev,et al.  ARPHA-BioDiv: A toolbox for scholarly publication and dissemination of biodiversity data based on the ARPHA Publishing Platform , 2017 .

[5]  David R. Morse,et al.  XML schemas and mark-up practices of taxonomic literature , 2011, ZooKeys.

[6]  Anton Güntsch,et al.  The Biodiversity Informatics Landscape: Elements, Connections and Opportunities , 2017 .

[7]  Jorge Soberón,et al.  The big questions for biodiversity informatics , 2010 .

[8]  Donat Agosti,et al.  Copyright and the Use of Images as Biodiversity Data , 2016, bioRxiv.

[9]  Donat Agosti,et al.  Implementation of TaxPub, an NLM DTD extension for domain-specific markup in taxonomy, from the experience of a biodiversity publisher , 2012 .

[10]  Indra Neil Sarkar,et al.  Biodiversity informatics: organizing and linking information across the spectrum of life , 2007, Briefings Bioinform..

[11]  Roderic D.M. Page,et al.  Ozymandias: a biodiversity knowledge graph , 2018, bioRxiv.

[12]  John Kunze,et al.  Community Next Steps for Making Globally Unique Identifiers Work for Biocollections Data , 2015, ZooKeys.

[13]  Alberto Apostolico,et al.  Global Biodiversity Informatics Outlook: Delivering biodiversity knowledge in the information age , 2013 .

[14]  John Wieczorek,et al.  Darwin Core: An Evolving Community-Developed Biodiversity Data Standard , 2012, PloS one.

[15]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[16]  Terence Catapano,et al.  TaxPub: An Extension of the NLM/NCBI Journal Publishing DTD for Taxonomic Descriptions , 2010 .

[17]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[18]  Gaurav Vaidya,et al.  Avibase – a database system for managing and organizing taxonomic concepts , 2014, ZooKeys.

[19]  Gregor Hagedorn,et al.  Open exchange of scientific knowledge and European copyright: The case of biodiversity information , 2014, ZooKeys.

[20]  P. Hebert,et al.  bold: The Barcode of Life Data System (http://www.barcodinglife.org) , 2007, Molecular ecology notes.

[21]  Krzysztof Janowicz,et al.  Five stars of Linked Data vocabulary use , 2014, Semantic Web.

[22]  Robert A. Morris,et al.  OpenBiodiv-O: ontology of the OpenBiodiv knowledge management system , 2018, Journal of Biomedical Semantics.

[23]  D J Patterson,et al.  Names are key to the big new biology. , 2010, Trends in ecology & evolution.

[24]  David Remsen,et al.  The use and limits of scientific names in biological informatics , 2016, ZooKeys.

[25]  Nico M. Franz,et al.  OpenBiodiv: an Implementaion of a Semantic System Running on top of the Biodiversity Knowledge Graph , 2017 .

[26]  Silvio Peroni Semantic Web Technologies and Legal Scholarly Publishing , 2014 .

[27]  Laurence Bénichou,et al.  Consortium of European Taxonomic Facilities (CETAF) best practices in electronic publishing in taxonomy , 2018, European Journal of Taxonomy.

[28]  Silvio Peroni,et al.  The Semantic Publishing and Referencing Ontologies , 2014 .

[29]  J. Cunningham History , 2007, The Journal of Hellenic Studies.

[30]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .

[31]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[32]  John Wieczorek,et al.  Connecting data and expertise: a new alliance for biodiversity knowledge , 2019, Biodiversity data journal.

[33]  Roderic D. M. Page,et al.  Biodiversity informatics: the challenge of linking data and the role of shared identifiers , 2008, Briefings Bioinform..