Sample data processing in an additive and reproducible taxonomic workflow by using character data persistently linked to preserved individual specimens

We present the model and implementation of a workflow that blazes a trail in systematic biology for the re-usability of character data (data on any kind of characters of pheno- and genotypes of organisms) and their additivity from specimen to taxon level. We take into account that any taxon characterization is based on a limited set of sampled individuals and characters, and that consequently any new individual and any new character may affect the recognition of biological entities and/or the subsequent delimitation and characterization of a taxon. Taxon concepts thus frequently change during the knowledge generation process in systematic biology. Structured character data are therefore not only needed for the knowledge generation process but also for easily adapting characterizations of taxa. We aim to facilitate the construction and reproducibility of taxon characterizations from structured character data of changing sample sets by establishing a stable and unambiguous association between each sampled individual and the data processed from it. Our workflow implementation uses the European Distributed Institute of Taxonomy Platform, a comprehensive taxonomic data management and publication environment to: (i) establish a reproducible connection between sampled individuals and all samples derived from them; (ii) stably link sample-based character data with the metadata of the respective samples; (iii) record and store structured specimen-based character data in formats allowing data exchange; (iv) reversibly assign sample metadata and character datasets to taxa in an editable classification and display them and (v) organize data exchange via standard exchange formats and enable the link between the character datasets and samples in research collections, ensuring high visibility and instant re-usability of the data. The workflow implemented will contribute to organizing the interface between phylogenetic analysis and revisionary taxonomic or monographic work. Database URL: http://campanula.e-taxonomy.net/

[1]  David A. Morrison,et al.  Tools for Identifying Biodiversity: Progress and Problems , 2012 .

[2]  H. Ross Principles of Numerical Taxonomy , 1964 .

[3]  Egon L. Willighagen,et al.  Bioclipse 2: A scriptable integration platform for the life sciences , 2009, BMC Bioinformatics.

[4]  E. Smets,et al.  Detailed mark-up of semi-monographic legacy taxonomic works using FlorML , 2014 .

[5]  Walter G. Berendsohn,et al.  The concept of "potential taxa" in databases , 1995 .

[6]  Mark Newman,et al.  Implementation of the Prometheus Taxonomic Model: a comparison of database models and query languages and an introduction to the Prometheus Object-Oriented Model , 2002 .

[7]  Roger Hyam,et al.  Stable citations for herbarium specimens on the internet: an illustration from a taxonomic revision of Duboscia (Malvaceae) , 2012 .

[8]  T. Stuessy Paradigms in biological classification (1707–2007): Has anything really changed? , 2009 .

[9]  W. G. Berendsohn,et al.  Biodiversity information platforms: From standards to interoperability , 2011, ZooKeys.

[10]  R. Peet,et al.  Perspectives: Towards a language for mapping relationships among taxonomic concepts , 2009 .

[11]  botanical libraries,et al.  Biodiversity Heritage Library , 2009 .

[12]  Brian Macisaac,et al.  Common data model , 1999 .

[13]  Revisionary taxonomy in a changing e-landscape , 2007 .

[14]  M. Watson,et al.  The Prometheus Taxonomic Model: a practical approach to representing multiple classifications. , 2000 .

[15]  Andreas Prlic,et al.  BioJava: an open-source framework for bioinformatics in 2012 , 2012, Bioinform..

[16]  Gregor Hagedorn,et al.  A method to establish and revise descriptive data sets over the Internet , 2000 .

[18]  Régine Vignes Lebbe,et al.  Xper²: managing descriptive data from their collection to e-monographs , 2010 .

[19]  Birgitta König-Ries,et al.  Towards an Integrated Biodiversity and Ecological Research Data Management and Archiving Platform: The German Federation for the Curation of Biological Data (GFBio) , 2014, GI-Jahrestagung.

[20]  P. Kirk,et al.  International Code of Nomenclature for algae, fungi, and plants (Melbourne Code) , 2012 .

[21]  R. Cowan,et al.  Vascular Plant Systematics , 1974 .

[22]  T. Stuessy Plant systematics : the origin, interpretation, and ordering of plant biodiversity , 2014 .

[23]  Walter G. Berendsohn,et al.  MoReTax : handling factual information linked to taxonomic concepts in biology , 2003 .

[24]  M. J. Dallwitz,et al.  A General System for Coding Taxonomic Descriptions , 1980 .

[25]  Gregor Hagedorn,et al.  A comprehensive reference model for biological collections and surveys , 1999 .

[26]  Simon D. Rycroft,et al.  Scratchpads: a data-publishing framework to build, share and manage information on the diversity of life , 2009, BMC Bioinformatics.

[27]  R. Henrik Nilsson,et al.  Finding needles in haystacks: linking scientific names, reference specimens and molecular data for Fungi , 2014, Database J. Biol. Databases Curation.

[28]  Walter G. Berendsohn,et al.  An integrative and dynamic approach for monographing species-rich plant groups - Building the global synthesis of the angiosperm order Caryophyllales , 2015 .

[29]  D. Maddison,et al.  NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[30]  M J Scoble,et al.  The web and the structure of taxonomy. , 2007, Systematic biology.

[31]  Florence Debarre,et al.  The Availability of Research Data Declines Rapidly with Article Age , 2013, Current Biology.

[32]  Robert A. Morris,et al.  Annotating biodiversity data via the Internet , 2013 .

[33]  Koichiro Tamura,et al.  MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. , 2013, Molecular biology and evolution.

[34]  Marc Geoffroy,et al.  Networking Taxonomic Concepts — Uniting without ‘Unitary-ism’ , 2007 .

[35]  Walter G. Berendsohn,et al.  Efficient rescue of threatened biodiversity data using reBiND workflows , 2012 .

[36]  Bernhard Seeger,et al.  A comparative evaluation of technical solutions for long-term data repositories in integrative biodiversity research , 2012, Ecol. Informatics.

[37]  H. Chandler Database , 1985 .

[38]  Taxonomy and environmental policy. , 2004, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[39]  Walter G. Berendsohn,et al.  Devising the EDIT Platform for Cybertaxonomy , 2010 .

[40]  T. Stuessy,et al.  Monographic plant systematics : fundamental assessment of plant biodiversity , 2011 .

[41]  Cedric Raguenaud,et al.  The Prometheus Description Model: an examination of the taxonomic description-building process and its representation , 2005 .

[42]  Gabriele Dröge,et al.  The Global Genome Biodiversity Network (GGBN) Data Portal , 2013, Nucleic Acids Res..

[43]  Walter G. Berendsohn,et al.  A taxonomic information model for botanical databases: the IOPI Model , 1997 .

[44]  Hilmar Lapp,et al.  NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata , 2012, Systematic biology.

[45]  Melissa C. Tulig,et al.  The Future of Botanical Monography: Report from an international workshop, 12-16 March 2012, Smolenice, Slovak Republic , 2013 .

[46]  Nico M. Franz,et al.  Description of two new species and phylogenetic reassessment of Perelleschus O’Brien & Wibmer, 1986 (Coleoptera: Curculionidae), with a complete taxonomic concept history of Perelleschus sec. Franz & Cardona-Duque, 2013 , 2013 .

[47]  M. Scoble,et al.  Alpha e-taxonomy: responses from the systematics community to the biodiversity crisis , 2008, Kew Bulletin.

[48]  A. Güntsch,et al.  Adding content to content - a generic annotation system for biodiversity data , 2009 .

[49]  Yvonne Jaeger,et al.  Vascular Plant Systematics , 2016 .

[50]  Ben C Stöver,et al.  LibrAlign - A GUI library for displaying and editing multiple sequence alignments and attached data Software based on LibrAlign , 2014 .

[51]  Alex Hardisty,et al.  BioVeL: Biodiversity Virtual e-Laboratory , 2011 .

[52]  C. Bailey,et al.  Plant Systematics: A Phylogenetic Approach , 2008 .

[53]  Jim Diederich Basic properties for biological databases: Character development and support , 1997 .