Community Next Steps for Making Globally Unique Identifiers Work for Biocollections Data

Abstract Biodiversity data is being digitized and made available online at a rapidly increasing rate but current practices typically do not preserve linkages between these data, which impedes interoperation, provenance tracking, and assembly of larger datasets. For data associated with biocollections, the biodiversity community has long recognized that an essential part of establishing and preserving linkages is to apply globally unique identifiers at the point when data are generated in the field and to persist these identifiers downstream, but this is seldom implemented in practice. There has neither been coalescence towards one single identifier solution (as in some other domains), nor even a set of recommended best practices and standards to support multiple identifier schemes sharing consistent responses. In order to further progress towards a broader community consensus, a group of biocollections and informatics experts assembled in Stockholm in October 2014 to discuss community next steps to overcome current roadblocks. The workshop participants divided into four groups focusing on: identifier practice in current field biocollections; identifier application for legacy biocollections; identifiers as applied to biodiversity data records as they are published and made available in semantically marked-up publications; and cross-cutting identifier solutions that bridge across these domains. The main outcome was consensus on key issues, including recognition of differences between legacy and new biocollections processes, the need for identifier metadata profiles that can report information on identifier persistence missions, and the unambiguous indication of the type of object associated with the identifier. Current identifier characteristics are also summarized, and an overview of available schemes and practices is provided.

[1]  Donat Agosti,et al.  Implementation of TaxPub, an NLM DTD extension for domain-specific markup in taxonomy, from the experience of a biodiversity publisher , 2012 .

[2]  John Wieczorek,et al.  Darwin Core: An Evolving Community-Developed Biodiversity Data Standard , 2012, PloS one.

[3]  John Wieczorek,et al.  Meeting report: Identifying practical applications of ontologies for biodiversity informatics , 2015, Standards in Genomic Sciences.

[4]  John Deck,et al.  The Trouble with Triplets in Biodiversity Informatics: A Data-Driven Case against Current Identifier Practices , 2014, PloS one.

[5]  Phil Cryer Adoption of Persistent Identifiers for Biodiversity Informatics , 2010 .

[6]  W. John Kress,et al.  Semantic tagging of and semantic enhancements to systematics papers: ZooKeys working examples , 2010, ZooKeys.

[7]  Jennifer Schaffner,et al.  A Beginner’s Guide to Persistent Identifiers , 2014 .

[8]  Robert P. Guralnick,et al.  The BiSciCol Triplifier: bringing biodiversity data to the Semantic Web , 2014, BMC Bioinformatics.

[9]  Terence Catapano,et al.  TaxPub: An Extension of the NLM/NCBI Journal Publishing DTD for Taxonomic Descriptions , 2010 .

[10]  Ruth E. Duerr,et al.  Achieving human and machine accessibility of cited data in scholarly publications , 2015, PeerJ Comput. Sci..

[11]  Barry Smith,et al.  Semantics in Support of Biodiversity Knowledge Discovery: An Introduction to the Biological Collections Ontology and Related Ontologies , 2014, PloS one.

[12]  Torsten Dikow,et al.  Beyond dead trees: integrating the scientific process in the Biodiversity Data Journal , 2013, Biodiversity data journal.

[13]  Roderic D. M. Page,et al.  Biodiversity informatics: the challenge of linking data and the role of shared identifiers , 2008, Briefings Bioinform..

[14]  Roderic D. M. Page,et al.  bioGUID: resolving, discovering, and minting identifiers for biodiversity informatics , 2009, BMC Bioinformatics.