Image-based classification of plant genus and family for trained and untrained plant species

BackgroundModern plant taxonomy reflects phylogenetic relationships among taxa based on proposed morphological and genetic similarities. However, taxonomical relation is not necessarily reflected by close overall resemblance, but rather by commonality of very specific morphological characters or similarity on the molecular level. It is an open research question to which extent phylogenetic relations within higher taxonomic levels such as genera and families are reflected by shared visual characters of the constituting species. As a consequence, it is even more questionable whether the taxonomy of plants at these levels can be identified from images using machine learning techniques.ResultsWhereas previous studies on automated plant identification from images focused on the species level, we investigated classification at higher taxonomic levels such as genera and families. We used images of 1000 plant species that are representative for the flora of Western Europe. We tested how accurate a visual representation of genera and families can be learned from images of their species in order to identify the taxonomy of species included in and excluded from learning. Using natural images with random content, roughly 500 images per species are required for accurate classification. The classification accuracy for 1000 species amounts to 82.2% and increases to 85.9% and 88.4% on genus and family level. Classifying species excluded from training, the accuracy significantly reduces to 38.3% and 38.7% on genus and family level. Excluded species of well represented genera and families can be classified with 67.8% and 52.8% accuracy.ConclusionOur results show that shared visual characters are indeed present at higher taxonomic levels. Most dominantly they are preserved in flowers and leaves, and enable state-of-the-art classification algorithms to learn accurate visual representations of plant genera and families. Given a sufficient amount and composition of training data, we show that this allows for high classification accuracy increasing with the taxonomic level and even facilitating the taxonomic identification of species excluded from the training process.

[1]  Randall T. Schuh,et al.  Biological Systematics: Principles and Applications , 1999 .

[2]  Elizabeth A. Kellogg,et al.  An ordinal classification for the families of flowering plants , 1998 .

[3]  Patrick Mäder,et al.  Automated plant species identification—Trends and future directions , 2018, PLoS Comput. Biol..

[4]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[5]  Peter F. Stevens,et al.  The Linear Angiosperm Phylogeny Group (LAPG) III: A linear sequence of the families in APG III , 2009 .

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Michael G. Simpson,et al.  1 – Plant Systematics: An Overview , 2010 .

[8]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[9]  Patrick Mäder,et al.  Recommending plant taxa for supporting on-site species identification , 2018, BMC Bioinformatics.

[10]  Patrick Mäder,et al.  Plant species classification using flower images—A comparative study of local feature representations , 2017, PloS one.

[11]  M. Gaudeul,et al.  Plant taxonomy: a historical perspective, current challenges, and perspectives. , 2014, Methods in molecular biology.

[12]  Pierre Bonnet,et al.  Plant Identification in an Open-world (LifeCLEF 2016) , 2016, CLEF.

[13]  Shengping Zhang,et al.  Computer vision cracks the leaf code , 2016, Proceedings of the National Academy of Sciences.

[14]  P. Weston,et al.  Majority rules, when systematists disagree , 2005 .

[15]  Patrick Mäder,et al.  Acquiring and preprocessing leaf images for automated plant identification: understanding the tradeoff between effort and information gain , 2017, Plant Methods.

[16]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  James F. Smith Phylogenetics of seed plants : An analysis of nucleotide sequences from the plastid gene rbcL , 1993 .

[18]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[19]  Martin Hofmann,et al.  Efficiently Annotating Object Images with Absolute Size Information Using Mobile Devices , 2019, International Journal of Computer Vision.

[20]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[21]  Vineeth N. Balasubramanian,et al.  Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[22]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.