Multi-Dimensional Machine Learning Approaches for Fruit Shape Recognition and Phenotyping in Strawberry

Background Shape is a critical element of the visual appeal of strawberry fruit and determined by both genetic and non-genetic factors. Current fruit phenotyping approaches for external characteristics in strawberry rely on the human eye to make categorical assessments. However, fruit shape is multi-dimensional, continuously variable, and not adequately described by a single quantitative variable. Morphometric approaches enable the study of complex forms but are often abstract and difficult to interpret. In this study, we developed a mathematical approach for transforming fruit shape classifications from digital images onto an ordinal scale called the principal progression of k clusters (PPKC). We use these human-recognizable shape categories to select features extracted from multiple morphometric analyses that are best fit for genome-wide and forward genetic analyses. Results We transformed images of strawberry fruit into human-recognizable categories using unsupervised machine learning, discovered four principal shape categories, and inferred progression using PPKC. We extracted 67 quantitative features from digital images of strawberries using a suite of morphometric analyses and multi-variate approaches. These analyses defined informative feature sets that effectively captured quantitative differences between shape classes. Classification accuracy ranged from 68.9 – 99.3% for the newly created, genetically correlated phenotypic variables describing a shape. Conclusions Our results demonstrated that strawberry fruit shapes could be robustly quantified, accurately classified, and empirically ordered using image analyses, machine learning, and PPKC. We generated a dictionary of quantitative traits for studying and predicting shape classes and identifying genetic factors underlying phenotypic variability for fruit shape in strawberry. The methods and approaches we applied in strawberry should apply to other fruits, vegetables, and specialty crops.

[1]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[2]  M. Battino,et al.  Increasing Strawberry Fruit Sensorial and Nutritional Quality Using Wild and Cultivated Germplasm , 2012, PloS one.

[3]  Ana I. Caño-Delgado,et al.  The genetic basis of fruit morphology in horticultural crops: lessons from tomato and melon. , 2013, Journal of experimental botany.

[4]  L. Brewer,et al.  Heritability of fruit shape in pears , 2000, Euphytica.

[5]  E. Baldwin,et al.  Historical Trends in Strawberry Fruit Quality Revealed by a Trial of University of Florida Cultivars and Advanced Selections , 2011 .

[6]  Mao Li,et al.  The Persistent Homology Mathematical Framework Provides Enhanced Genotype-to-Phenotype Associations for Plant Morphology1[OPEN] , 2018, Plant Physiology.

[7]  Yves Rosseel,et al.  lavaan: An R Package for Structural Equation Modeling , 2012 .

[8]  A. Fernie,et al.  Genetic diversity of strawberry germplasm using metabolomic biomarkers , 2018, Scientific Reports.

[9]  The workloads of farmers who sort and pack strawberries in accordance with standards of shipment and their awareness of standards of shipment. , 1989 .

[10]  I. Dworkin,et al.  An image database of Drosophila melanogaster wings for phenomic and biometric analysis , 2015, GigaScience.

[11]  Fred L. Bookstein,et al.  Landmark methods for forms without landmarks: morphometrics of group differences in outline shape , 1997, Medical Image Anal..

[12]  A. Iezzoni,et al.  Large-Scale Standardized Phenotyping of Strawberry in RosBREED , 2013 .

[13]  H. J. Kim,et al.  Mapping of two suppressors of OVATE (sov) loci in tomato , 2013, Heredity.

[14]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[15]  M. Sorrells,et al.  Genomic Selection for Crop Improvement , 2009 .

[16]  José Luis Micol,et al.  Mutational spaces for leaf shape and size , 2008, HFSP journal.

[17]  Byoung-Cheorl Kang,et al.  An ultra-high-density bin map facilitates high-throughput QTL mapping of horticultural traits in pepper (Capsicum annuum) , 2016, DNA research : an international journal for rapid publication of reports on genes and genomes.

[18]  E. Knaap,et al.  Genome organization of the tomato sun locus and characterization of the unusual retrotransposon Rider. , 2009, The Plant journal : for cell and molecular biology.

[19]  Mao Li,et al.  Topological Data Analysis as a Morphometric Method: Using Persistent Homology to Demarcate a Leaf Morphospace , 2018, Front. Plant Sci..

[20]  J. Gower Generalized procrustes analysis , 1975 .

[21]  L. Antanaviciute Genetic mapping and phenotyping plant characteristics, fruit quality and disease resistance traits in octoploid strawberry (Fragaria × ananassa) , 2016 .

[22]  A. Moing,et al.  Genetic dissection of fruit quality traits in the octoploid cultivated strawberry highlights the role of homoeo-QTL in their control , 2012, Theoretical and Applied Genetics.

[23]  Glenn S. Cole,et al.  Genome-Wide Association Mapping Uncovers Fw1, a Dominant Gene Conferring Resistance to Fusarium Wilt in Strawberry , 2018, G3: Genes, Genomes, Genetics.

[24]  R. Lande,et al.  Efficiency of marker-assisted selection in the improvement of quantitative traits. , 1990, Genetics.

[25]  Christopher S. Langlo,et al.  Correlating Photoreceptor Mosaic Structure to Clinical Findings in Stargardt Disease , 2016, Translational vision science & technology.

[26]  W. Ewens Genetics and analysis of quantitative traits , 1999 .

[27]  Wendy Moncur,et al.  The Accuracy and Reliability of Crowdsource Annotations of Digital Retinal Images , 2016, Translational vision science & technology.

[28]  C. Klingenberg,et al.  QUANTITATIVE GENETICS OF GEOMETRIC SHAPE IN THE MOUSE MANDIBLE , 2001, Evolution; international journal of organic evolution.

[29]  Glenn S. Cole,et al.  Domestication of Temperate and Coastal Hybrids with Distinct Ancestral Gene Selection in Octoploid Strawberry , 2018, The plant genome.

[30]  E. Knaap,et al.  A common genetic mechanism underlies morphological diversity in fruits and other plant organs , 2018, Nature Communications.

[31]  D. Chitwood,et al.  Cc-by-nc-nd 4.0 International License , 2022 .

[32]  Carlos A Manacorda,et al.  Arabidopsis phenotyping through geometric morphometrics , 2017, bioRxiv.

[33]  Guifang Fu,et al.  A statistical model for mapping morphological shape , 2010, Theoretical Biology and Medical Modelling.

[34]  A. Agresti Analysis of Ordinal Categorical Data , 1985 .

[35]  Construction of a dense SNP map of a highly heterozygous diploid potato population and QTL analysis of tuber shape and eye depth , 2014, Theoretical and Applied Genetics.

[36]  P. Carneiro,et al.  New Proposals to Estimate Unbiased Selection Gain and Coefficient of Variation in Traits Evaluated Using Score Scales , 2019, Crop Science.

[37]  Fang Wang,et al.  SIOX plugin in ImageJ: area measurement made easy , 2017 .

[38]  M. Yano,et al.  SmartGrain: High-Throughput Phenotyping Software for Measuring Seed Shape through Image Analysis1[C][W][OA] , 2012, Plant Physiology.

[39]  Denis Mestivier,et al.  AutoClass@IJM: a powerful tool for Bayesian classification of heterogeneous data in biology , 2009, Nucleic Acids Res..

[40]  T. C. Nesbitt,et al.  fw2.2: a quantitative trait locus key to the evolution of tomato fruit size. , 2000, Science.

[41]  J. Cheverud,et al.  Quantitative genetics of skeletal nonmetric traits in the rhesus macaques on Cayo Santiago. II. Phenotypic, genetic, and environmental correlations between traits. , 1981, American journal of physical anthropology.

[42]  Achim Zeileis,et al.  Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned , 2012 .

[43]  Jeffrey P. Mower,et al.  Origin and evolution of the octoploid strawberry genome , 2019, Nature Genetics.

[44]  Brad M. Binder,et al.  Reshaping Plant Biology: Qualitative and Quantitative Descriptors for Plant Morphology , 2017, Front. Plant Sci..

[45]  G. Evanno,et al.  Detecting the number of clusters of individuals using the software structure: a simulation study , 2005, Molecular ecology.

[46]  Charles R. Giardina,et al.  Elliptic Fourier features of a closed contour , 1982, Comput. Graph. Image Process..

[47]  B. Liu,et al.  Interlinked regulatory loops of ABA catabolism and biosynthesis coordinate fruit growth and ripening in woodland strawberry , 2018, Proceedings of the National Academy of Sciences.

[48]  T. Hasing,et al.  Estimation of Genetic Parameters for 12 Fruit and Vegetative Traits in the University of Florida Strawberry Breeding Population , 2012 .

[49]  Michael F. Covington,et al.  A Modern Ampelography: A Genetic Basis for Leaf Shape and Venation Patterning in Grape1[C][W][OPEN] , 2013, Plant Physiology.

[50]  Ning Jiang,et al.  Morphological Variation of Tomato Fruit A Retrotransposon-Mediated Gene Duplication Underlies , 2014 .

[51]  L Sirovich,et al.  Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[52]  V. Kshirsagar,et al.  Face recognition using Eigenfaces , 2011, 2011 3rd International Conference on Computer Research and Development.

[53]  A. Michel,et al.  Distribution of SUN, OVATE, LC, and FAS in the Tomato Germplasm and the Relationship to Fruit Shape Diversity1[C][W][OA] , 2011, Plant Physiology.

[54]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[55]  Cédric Gaucherel,et al.  Momocs: Outline Analysis Using R , 2014 .

[56]  Baskar Ganapathysubramanian,et al.  Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning , 2018, PLoS Comput. Biol..

[57]  T. Gradziel,et al.  Application of a Bayesian ordinal animal model for the estimation of breeding values for the resistance to Monilinia fruticola (G.Winter) Honey in progenies of peach [Prunus persica (L.) Batsch] , 2017, Breeding science.

[58]  Nathan D. Miller,et al.  An Automated Image Analysis Pipeline Enables Genetic Studies of Shoot and Root Morphology in Carrot (Daucus carota L.) , 2018, Front. Plant Sci..

[59]  A. Agresti Analysis of Ordinal Categorical Data: Agresti/Analysis , 2010 .

[60]  Jean-Michel Poggi,et al.  VSURF: An R Package for Variable Selection Using Random Forests , 2015, R J..

[61]  R. Bernardo,et al.  Germplasm Architecture Revealed through Chromosomal Effects for Quantitative Traits in Maize , 2016, The plant genome.

[62]  K. Lewers,et al.  Antioxidant Capacity and Flavonoid Content in Wild Strawberries , 2007 .

[63]  S. Tanksley The Genetic, Developmental, and Molecular Bases of Fruit Size and Shape Variation in Tomato , 2004, The Plant Cell Online.

[64]  Graham W. Horgan,et al.  Use of statistical image analysis to discriminate carrot cultivars , 2001 .

[65]  Kevin M. Stoffel,et al.  Quantitative Trait Loci Controlling Fruit Size and Other Horticultural Traits in Bell Pepper (Capsicum annuum) , 2018, The plant genome.

[66]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[67]  Kevin W Eliceiri,et al.  NIH Image to ImageJ: 25 years of image analysis , 2012, Nature Methods.

[68]  T. A. Martin,et al.  Accuracy of Genomic Selection Methods in a Standard Data Set of Loblolly Pine (Pinus taeda L.) , 2012, Genetics.

[69]  K. Eskridge,et al.  Genomic-Enabled Prediction of Ordinal Data with Bayesian Logistic Ordinal Regression , 2015, G3: Genes, Genomes, Genetics.

[70]  Graham W. Horgan,et al.  The statistical analysis of plant part appearance — a review , 2001 .

[71]  Xianzhong Feng,et al.  Evolution through genetically controlled allometry space. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[72]  Sofia Visa,et al.  Modeling of tomato fruits into nine shape categories using elliptic fourier shape modeling and Bayesian classification of contour morphometric data , 2014, Euphytica.

[73]  S. Myles,et al.  Genome to Phenome Mapping in Apple Using Historical Data , 2016, The plant genome.

[74]  J. Doe,et al.  Fast‐Track Introgression of “QTL‐hotspot” for Root Traits and Other Drought Tolerance Traits in JG 11, an Elite and Leading Variety of Chickpea , 2016, The plant genome.

[75]  Haitao Chu,et al.  mmeta: An R Package for Multivariate Meta-Analysis. , 2014, Journal of statistical software.

[76]  G. de los Campos,et al.  Threshold Models for Genome-Enabled Prediction of Ordinal Categorical Traits in Plant Breeding , 2014, G3: Genes, Genomes, Genetics.

[77]  I. Dan,et al.  CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE LEARNING , 2018, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences.

[78]  Johannes E. Schindelin,et al.  Fiji: an open-source platform for biological-image analysis , 2012, Nature Methods.

[79]  Abdolvahab Ehsanirad Plant Classification Based on Leaf Recognition , 2010 .

[80]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics, ProbabilityTheory Group (Formerly: E1071), TU Wien , 2015 .

[81]  David J Hearn Shape analysis for the automated identification of plants from images of leaves. , 2009 .

[82]  B. J. Hayes,et al.  Genomic selection: Genomic selection , 2007 .

[83]  G. Darrow The strawberry : history, breeding and physiology , 1966 .

[84]  James B. Schreiber,et al.  Reporting Structural Equation Modeling and Confirmatory Factor Analysis Results: A Review , 2006 .

[85]  S. Tanksley,et al.  A new class of regulatory genes underlying the cause of pear-shaped tomato fruit , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[86]  A. Aharoni,et al.  Gain and Loss of Fruit Flavor Compounds Produced by Wild and Cultivated Strawberry Species , 2004, The Plant Cell Online.