‘openDS’ – A new standard for digital specimens and other natural science digital object types

With projected lifespans of many decades, infrastructure initiatives such as Europe’s Distributed System of Scientific Collections (DiSSCo), USA’s Integrated Digitized Biocollections (iDigBio), National Specimen Information Infrastructure (NSII) of China and Australia’s digitisation of national research collections (NRCA Digital) aim at transforming today’s slow, inefficient and limited practices of working with natural science collections. The need to borrow specimens (plants, animals, fossils or rocks) or physically visit collections, and absence of linkages to other relevant information represent significant impediments to answering today’s scientific and societal questions. A logical extension of the Internet, Digital Object Architecture (Kahn and Wilensky 2006) offers a way of grouping, managing and processing fragments of information relating to a natural science specimen. A ‘digital specimen’ acts as a surrogate in cyberspace for a specific physical specimen, identifying its actual location and authoritatively saying something about its collection event (who, when, where) and taxonomy, as well as providing links to high-resolution images. A digital specimen exposes supplementary information about related literature, traits, tissue samples and DNA sequences, chemical analyses, environmental information, etc. stored elsewhere than in the natural science collection itself. By presenting digital specimens as a new layer between data infrastructure of natural science collections and user applications for processing and interacting with information about specimens and collections, it’s possible to seamlessly organise global access spanning multiple collection-holding institutions and sources. Virtual collections of digital specimens with unique identifiers offer possibilities for wider, more flexible, and ‘FAIR’ (Findable, Accessible, Interoperable, Reusable) access for varied research and policy uses: recognising curatorial work, annotating with latest taxonomic treatments, understanding variations, working with DNA sequences or chemical analyses, supporting regulatory processes for health, food, security, sustainability and environmental change, inventions/products critical to the bio-economy, and educational uses. Adopting a digital specimen approach is expected to lead to faster insights for lower cost on many fronts. We propose that realising this vision requires a new TDWG standard. OpenDS is a specification of digital specimen and other object types essential to mass digitisation of natural science collections and their digital use. For five principal digital object types corresponding to major categories of collections and specimens’ information, OpenDS defines structure and content, and behaviours that can act upon them: 1. Digital specimen: Representing a digitised physical specimen, contains information about a single specimen with links to related supplementary information; 2. Storage container: Representing groups of specimens stored within a single container, such as insect tray, drawer or sample jar; 3. Collection: Information about characteristics of a collection; 4. Organisation: Information about the legal-entity owning the specimen and collection to which it belongs; and, 5. Interpretation: Assertion(s) made on or about the specimen such as determination of species and comments. Secondary classes gather presentation/preservation characteristics (e.g., herbarium sheets, pinned insects, specimens in glass jars, etc.), the general classification of a specimen (i.e., plant, animal, fossil, rock, etc.) and history of actions on the object (provenance). Equivalencing concepts in ABCD 3.0 and EFG extension for geo-sciences, OpenDS is also an ontology extending OBO Foundry’s Biological Collection Ontology (BCO) (Walls et al. 2014) from bco:MaterialSample, which has preferred label dwc:specimen from Darwin Core, thus linking it also with that standard. OpenDS object content can be serialized to specific formats/representations (e.g. JSON) for different exchange and processing purposes.