Landscape Analysis for the Specimen Data Refinery

This report reviews the current state-of-the-art applied approaches on automated tools, services and workflows for extracting information from images of natural history specimens and their labels. We consider the potential for repurposing existing tools, including workflow management systems; and areas where more development is required. This paper was written as part of the SYNTHESYS+ project for software development teams and informatics teams working on new software-based approaches to improve mass digitisation of natural history specimens. ‡ ‡ § | |

[1]  F. Bisby,et al.  Species 2000 & ITIS Catalogue of Life , 2010 .

[2]  Carole Goble,et al.  Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv , 2019, GigaScience.

[3]  Donald Hobern,et al.  Catalogue of Life Plus: A collaborative project to complete the checklist of the world's species , 2019, Biodiversity Information Science and Standards.

[4]  Alexis Joly,et al.  Machine Learning Using Digitized Herbarium Specimens to Advance Phenological Research , 2020, Bioscience.

[5]  Quentin Groom,et al.  A cost analysis of transcription systems , 2020, Research Ideas and Outcomes.

[6]  Anton Güntsch,et al.  A benchmark dataset of herbarium specimen images with label data , 2019, Biodiversity data journal.

[7]  David Remsen,et al.  The use and limits of scientific names in biological informatics , 2016, ZooKeys.

[8]  Bertram Ludäscher,et al.  Introducing Explorer of Taxon Concepts with a case study on spider measurement matrix building , 2016, BMC Bioinformatics.

[9]  Johannes Stegmaier,et al.  New Methods to Improve Large-Scale Microscopy Image Analysis with Prior Knowledge and Uncertainty , 2016, ArXiv.

[10]  Rebecca B. Dikow,et al.  Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning , 2020, Applications in plant sciences.

[11]  Yolanda Gil,et al.  FAIR Computational Workflows , 2020, Data Intelligence.

[12]  Yan Han,et al.  The Effect of Background on A Deep Learning Model in Identifying Images of Butterfly Species , 2019 .

[13]  Bassem Bouaziz,et al.  Measuring Morphological Functional Leaf Traits From Digitized Herbarium Specimens Using TraitEx Software , 2019, Biodiversity Information Science and Standards.

[14]  Irena Spasić,et al.  Towards a scientific workflow featuring Natural Language Processing for the digitisation of natural history collections , 2020, Research Ideas and Outcomes.

[15]  Jin Liu,et al.  Stem–Leaf Segmentation and Phenotypic Trait Extraction of Individual Maize Using Terrestrial LiDAR Data , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[16]  John Chilton,et al.  Common Workflow Language, v1.0 , 2016 .

[17]  Carole Goble,et al.  A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control , 2014, Biodiversity data journal.

[18]  Arturo H. Ariño APPROACHES TO ESTIMATING THE UNIVERSE OF NATURAL HISTORY COLLECTIONS DATA , 2010 .

[19]  Hong Cui,et al.  Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing , 2018, Applications in plant sciences.

[20]  Jeffrey M. Perkel,et al.  Workflow systems turn raw data into scientific knowledge , 2019, Nature.

[21]  Mario Lasseck Image-based Plant Species Identification with Deep Convolutional Neural Networks , 2017, CLEF.

[22]  Alban Gaignard,et al.  Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities , 2017, Future Gener. Comput. Syst..

[23]  R. Guralnick,et al.  BioGeomancer: Automated Georeferencing to Map the World's Biodiversity Data , 2006, PLoS biology.

[24]  Eduard Szöcs,et al.  taxize: taxonomic search and retrieval in R , 2013, F1000Research.

[25]  Gregor Hagedorn,et al.  Discovery and publishing of primary biodiversity data associated with multimedia resources: The Audubon Core strategies and approaches , 2013 .

[26]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[27]  Carole A. Goble,et al.  SEEK: a systems biology data and model management platform , 2015, BMC Systems Biology.

[28]  Martha Palmer,et al.  Automated Trait Extraction using ClearEarth, a Natural Language Processing System for Text Mining in Natural Sciences , 2018 .

[29]  Vincent S. Smith,et al.  SYNTHESYS+ Abridged Grant Proposal , 2019, Research Ideas and Outcomes.

[30]  Nursuriati Jamil,et al.  Automatic Plant Identification: Is Shape the Key Feature?☆ , 2015 .

[31]  Ralf Mikut,et al.  Fuzzy-based propagation of prior knowledge to improve large-scale image analysis pipelines , 2016, PloS one.

[32]  Simon Chagnoux,et al.  People of Collections: Facilitators of Interoperability? , 2019, Biodiversity Information Science and Standards.

[33]  James P. Balhoff,et al.  Enabling Machine-Actionable Semantics For Comparative Analyses Of Trait Evolution , 2017 .

[34]  T. Suk,et al.  Leaf recognition of woody species in Central Europe , 2013 .

[35]  Carole A. Goble,et al.  Using a suite of ontologies for preserving workflow-centric research objects , 2015, J. Web Semant..

[36]  Yuxuan Wang,et al.  A Leaf Recognition Algorithm for Plant Classification Using Probabilistic Neural Network , 2007, 2007 IEEE International Symposium on Signal Processing and Information Technology.

[37]  Lavanya Ramakrishnan,et al.  The future of scientific workflows , 2018, Int. J. High Perform. Comput. Appl..

[38]  Yucel Inan,et al.  Leaves Recognition System Using a Neural Network , 2016 .

[39]  Anne Thessen,et al.  Challenges with using names to link digital biodiversity information , 2016, Biodiversity data journal.

[40]  Walid Mahdi,et al.  Objects Detection from Digitized Herbarium Specimen based on Improved YOLO V3 , 2020, VISIGRAPP.

[41]  Mark P. Robertson,et al.  Biogeo: an R package for assessing and improving data quality of occurrence record datasets , 2016 .

[42]  Sameerchand Pudaruth,et al.  Plant Leaf Recognition Using Shape Features and Colour Histogram with K-nearest Neighbour Classifiers , 2015 .

[43]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.