Tools for Semantic Annotation of Taxonomic Descriptions

A software application for automated semantic annotation of taxonomic, especially morphological, descriptions is reported in this paper. The tool is based on unsupervised machine learning methods. It is designed to annotate descriptions in a deviated syntax that is not normal English but often used in morphological descriptions. The unsupervised annotation system does not need any training examples to annotate text descriptions. It uses a relevant glossary available to it but aims to learn as much information as possible from the text itself. Tools such as this are needed to reformat free-text or OCRed taxonomic documents to a semantic-explicit format for easy and intelligent access, providing character data for phylogenetic research, climate impact on biodiversity, and traditional biosystematics research.