Can you make morphometrics work when you know the right answer? Pick and mix approaches for apple identification

Morphological classification of living things has challenged science for several centuries and has led to a wide range of objective morphometric approaches in data gathering and analysis. In this paper we explore those methods using apple cultivars, a model biological system in which discrete groups are pre-defined but in which there is a high level of overall morphological similarity. The effectiveness of morphometric techniques in discovering the groups is evaluated using statistical learning tools. No one technique proved optimal in classification on every occasion, linear morphometric techniques slightly out-performing geometric (72.6% accuracy on test set versus 66.7%). The combined use of these techniques with post-hoc knowledge of their individual successes with particular cultivars achieves a notably higher classification accuracy (77.8%). From this we conclude that even with pre-determined discrete categories, a range of approaches is needed where those categories are intrinsically similar to each other, and we raise the question of whether in studies where potentially continuous natural variation is being categorised the level of match between categories is routinely set too high.

[1]  Giuseppe Marramà,et al.  Principal component and discriminant analyses as powerful tools to support taxonomic identification and their use for functional and phylogenetic signal detection of isolated fossil shark teeth , 2017, PloS one.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[4]  T. Hedderson,et al.  A morphometric analysis of the Cimicifuga foetida L. complex (Ranunculaceae) , 1997 .

[5]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[6]  James W. Amrine,et al.  Wheat curl mite and dry bulb mite: untangling a taxonomic conundrum through a multidisciplinary approach , 2014 .

[7]  Hiroyoshi Higuchi,et al.  Incorporating color into integrative taxonomy: analysis of the varied tit (Sittiparus varius) complex in East Asia. , 2014, Systematic biology.

[8]  M. Ronikier,et al.  Independent evolutionary history between the Balkan ranges and more northerly mountains in Campanula alpina s.l. (Campanulaceae): Genetic divergence and morphological segregation of taxa , 2014 .

[9]  Jieun Kim,et al.  A Computational Framework for Age‐at‐Death Estimation from the Skeleton: Surface and Outline Analysis of 3D Laser Scans of the Adult Pubic Symphysis , 2017, Journal of forensic sciences.

[10]  Michael D. Abràmoff,et al.  Image processing with ImageJ , 2004 .

[11]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[12]  Charles Bouveyron,et al.  Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes , 2014, J. Classif..

[13]  M. Harada,et al.  Morphology and mitochondrial phylogenetics reveal that the Amazon River separates two eastern squirrel monkey species: Saimiri sciureus and S. collinsi. , 2015, Molecular phylogenetics and evolution.

[14]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[15]  Jonathan Y. Clark,et al.  Automating Digital Leaf Measurement: The Tooth, the Whole Tooth, and Nothing but the Tooth , 2012, PloS one.

[16]  B. Glass,et al.  Stony Brook. , 1968, Science.

[17]  Tom Hintz,et al.  Comparison of SVMs in Number Plate Recognition , 2007 .

[18]  I. Gauthier,et al.  Visual object understanding , 2004, Nature Reviews Neuroscience.

[19]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[20]  F. Rohlf,et al.  Extensions of the Procrustes Method for the Optimal Superimposition of Landmarks , 1990 .

[21]  Niki Simpson,et al.  The vegetative key to the British flora , 2009 .

[22]  Shengping Zhang,et al.  Computer vision cracks the leaf code , 2016, Proceedings of the National Academy of Sciences.

[23]  B. Van Bocxlaer,et al.  Comparison of morphometric techniques for shapes with few homologous landmarks based on machine-learning approaches to biological discrimination , 2010, Paleobiology.

[24]  Pawan Sinha,et al.  Recognizing complex patterns , 2002, Nature Neuroscience.

[25]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[26]  P. Velemínský,et al.  Technical note: geometric morphometrics and sexual dimorphism of the greater sciatic notch in adults from two skeletal collections: the accuracy and reliability of sex classification. , 2013, American journal of physical anthropology.

[27]  Guan-Ze Qian,et al.  (1933) Proposal to conserve the name Malus domestica against M. pumila, M. communis, M. frutescens, and Pyrus dioica (Rosaceae) , 2010 .

[28]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[29]  Jody Hey,et al.  On the failure of modern species concepts. , 2006, Trends in ecology & evolution.

[30]  D. Cooper,et al.  Switching Virally Suppressed, Treatment-Experienced Patients to a Raltegravir-Containing Regimen Does Not Alter Levels of HIV-1 DNA , 2012, PloS one.

[31]  Carl von Linné,et al.  Species Plantarum : a facsimile of the first edition 1753 , 1957 .

[32]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[33]  Fabiana Soares Santana,et al.  A reference process for automating bee species identification based on wing images and digital image processing , 2014, Ecol. Informatics.

[34]  F. Rohlf,et al.  Geometric morphometrics: Ten years of progress following the ‘revolution’ , 2004 .

[35]  Stephen P. Boyd,et al.  Robust Fisher Discriminant Analysis , 2005, NIPS.

[36]  Ralf Wieland,et al.  Classification in conservation biology: A comparison of five machine-learning methods , 2010, Ecol. Informatics.

[37]  With eyes wide open: a revision of species within and closely related to the Pocillopora damicornis species complex (Scleractinia; Pocilloporidae) using morphology and genetics , 2014 .

[38]  E. Pante,et al.  From integrative taxonomy to species description: one step beyond. , 2015, Systematic biology.

[39]  Jonathan Y. Clark,et al.  Automatic Extraction of Leaf Characters from Herbarium Specimens , 2012 .

[40]  David Corney,et al.  Leaf-based Automated Species Classification Using Image Processing and Neural Networks , 2017 .

[41]  Jie Tian,et al.  Seeing Jesus in toast: Neural and behavioral correlates of face pareidolia , 2014, Cortex.

[42]  C. Klingenberg MorphoJ: an integrated software package for geometric morphometrics , 2011, Molecular ecology resources.

[43]  Vincent Lepetit,et al.  Fast Keypoint Recognition Using Random Ferns , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  John Langford,et al.  CAPTCHA: Using Hard AI Problems for Security , 2003, EUROCRYPT.

[45]  J. M. Matías,et al.  IPez: An expert system for the taxonomic identification of fishes based on machine learning techniques , 2010 .

[46]  B. Šket,et al.  Morphological vs. molecular delineation of taxa across montane regions in Europe: the case study of Gammarus balcanicus Schäferna, (Crustacea: Amphipoda) , 2014 .

[47]  John G. Day,et al.  Conservation of microalgal type material: Approaches needed for 21st century science , 2010 .

[48]  Fred L Bookstein,et al.  Computing the uniform component of shape variation. , 2003, Systematic biology.

[49]  Felipe Leno da Silva,et al.  Evaluating classification and feature selection techniques for honeybee subspecies identification using wing images , 2015, Comput. Electron. Agric..

[50]  Jody Hey,et al.  Understanding and confronting species uncertainty in biology and conservation , 2003 .

[51]  A. Takada,et al.  Putative endogenous filovirus VP35-like protein potentially functions as an IFN antagonist but not a polymerase cofactor , 2017, PloS one.

[52]  Stephen J. Lycett,et al.  A 3D morphometric analysis of surface geometry in Levallois cores: patterns of stability and variability across regions and their implications , 2013 .

[53]  W. Almirón,et al.  Discrimination of four Culex (Culex) species from the Neotropics based on geometric morphometrics , 2015, Zoomorphology.

[54]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[55]  R. Šanda,et al.  Combining Morphology and Genetics in Resolving Taxonomy–A Systematic Revision of Spined Loaches (Genus Cobitis; Cypriniformes, Actinopterygii) in the Adriatic Watershed , 2014, PloS one.

[56]  Tae-Soo Jang,et al.  Disentangling relationships within the disjunctly distributed Alyssum ovirense/A. wulfenianum group (Brassicaceae), including description of a novel species from the north‐eastern Alps , 2014 .

[57]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[58]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[59]  Peter A. Alspach,et al.  Founding Clones, Inbreeding, Coancestry, and Status Number of Modern Apple Cultivars , 1996 .

[60]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[61]  Thomas Lecocq,et al.  Methods for species delimitation in bumblebees (Hymenoptera, Apidae, Bombus): towards an integrative approach , 2015 .

[62]  Shenghuo Zhu,et al.  Efficient Object Detection and Segmentation for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.