The Additivity Project: Achieving additivity of structured taxonomic character data by persistently linking them to individual specimens

Herbarium specimens have always played a central role in the classical disciplines of plant sciences and the global digitisation efforts now open new horizons. To make full use of the inherent possibilities of specimen based taxonomic descriptions corresponding workflows are needed. A crucial step in the comparative analyses of organisms is the preparation of a character matrix to record and compare the morphological variation of taxa on the basis of individual specimens. This project focuses on the optimisation of the taxonomic research process with respect to delimitation and characterisation (“descriptions”) of taxa (Henning et al. 2018). The angiosperm order Caryophyllales provides exemplar use cases through cooperation with the Global Caryophyllales Initiative (Borsch et al. 2015). The workflow for sample data handling (Kilian et al. 2015), implemented on the EDIT Platform for Cybertaxonomy (http:// www.cybertaxonomy.org, Ciardelli et al. 2009), has been extended to support additive characterisation of taxa via specimen character data. The Common Data Model (CDM), already supporting persistent inter-linking of specimens and their metadata (Plitzner et al. 2017), has been adapted to facilitate specimen ‡ ‡ ‡ ‡ ‡ ‡ ‡ © Plitzner P et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. descriptions with characters constructed from the combination of structure and property terms and their corresponding states. Semantic web technology is used to establish and continuously elaborate expert community-coordinated exemplar vocabularies with term ontologies and explanations for characters and states (GFBio Terminology Service, Karam et al. 2016). Character data are recorded and stored in structured form in character state matrices for individual specimens instead of taxa, which allows generation of taxon characterisations by aggregating the data sets for the individual specimens included. Separating characters in structures and properties, which are based on concepts in public ontologies, guarantees a high visibility and instant re-usability of these character data. Taking into account that taxon concepts evolve during the iterative knowledge generation process in systematic biology, additivity of character data from specimen to taxon level therefore greatly facilitates the construction and reproducibility of taxon characterisations from changing specimen and character data sets.