Multi-dimensional machine learning approaches for fruit shape phenotyping in strawberry

Abstract Background Shape is a critical element of the visual appeal of strawberry fruit and is influenced by both genetic and non-genetic determinants. Current fruit phenotyping approaches for external characteristics in strawberry often rely on the human eye to make categorical assessments. However, fruit shape is an inherently multi-dimensional, continuously variable trait and not adequately described by a single categorical or quantitative feature. Morphometric approaches enable the study of complex, multi-dimensional forms but are often abstract and difficult to interpret. In this study, we developed a mathematical approach for transforming fruit shape classifications from digital images onto an ordinal scale called the Principal Progression of k Clusters (PPKC). We use these human-recognizable shape categories to select quantitative features extracted from multiple morphometric analyses that are best fit for genetic dissection and analysis. Results We transformed images of strawberry fruit into human-recognizable categories using unsupervised machine learning, discovered 4 principal shape categories, and inferred progression using PPKC. We extracted 68 quantitative features from digital images of strawberries using a suite of morphometric analyses and multivariate statistical approaches. These analyses defined informative feature sets that effectively captured quantitative differences between shape classes. Classification accuracy ranged from 68% to 99% for the newly created phenotypic variables for describing a shape. Conclusions Our results demonstrated that strawberry fruit shapes could be robustly quantified, accurately classified, and empirically ordered using image analyses, machine learning, and PPKC. We generated a dictionary of quantitative traits for studying and predicting shape classes and identifying genetic factors underlying phenotypic variability for fruit shape in strawberry. The methods and approaches that we applied in strawberry should apply to other fruits, vegetables, and specialty crops.

[1]  Mao Li,et al.  Topological Data Analysis as a Morphometric Method: Using Persistent Homology to Demarcate a Leaf Morphospace , 2018, Front. Plant Sci..

[2]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[3]  Girish Chowdhary,et al.  In‐Field Whole‐Plant Maize Architecture Characterized by Subcanopy Rovers and Latent Space Phenotyping , 2019, The Plant Phenome Journal.

[4]  I. Dworkin,et al.  An image database of Drosophila melanogaster wings for phenomic and biometric analysis , 2015, GigaScience.

[5]  Fang Wang,et al.  SIOX plugin in ImageJ: area measurement made easy , 2017 .

[6]  M. Yano,et al.  SmartGrain: High-Throughput Phenotyping Software for Measuring Seed Shape through Image Analysis1[C][W][OA] , 2012, Plant Physiology.

[7]  Denis Mestivier,et al.  AutoClass@IJM: a powerful tool for Bayesian classification of heterogeneous data in biology , 2009, Nucleic Acids Res..

[8]  Fred L. Bookstein,et al.  Landmark methods for forms without landmarks: morphometrics of group differences in outline shape , 1997, Medical Image Anal..

[9]  V. Kshirsagar,et al.  Face recognition using Eigenfaces , 2011, 2011 3rd International Conference on Computer Research and Development.

[10]  A. Michel,et al.  Distribution of SUN, OVATE, LC, and FAS in the Tomato Germplasm and the Relationship to Fruit Shape Diversity1[C][W][OA] , 2011, Plant Physiology.

[11]  T. C. Nesbitt,et al.  fw2.2: a quantitative trait locus key to the evolution of tomato fruit size. , 2000, Science.

[12]  Charles R. Giardina,et al.  Elliptic Fourier features of a closed contour , 1982, Comput. Graph. Image Process..

[13]  R. Varshney,et al.  Genomic Selection for Crop Improvement , 2017, Springer International Publishing.

[14]  Abdolvahab Ehsanirad Plant Classification Based on Leaf Recognition , 2010 .

[15]  G. de los Campos,et al.  Threshold Models for Genome-Enabled Prediction of Ordinal Categorical Traits in Plant Breeding , 2014, G3: Genes, Genomes, Genetics.

[16]  Baskar Ganapathysubramanian,et al.  Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning , 2018, PLoS Comput. Biol..

[17]  T. Gradziel,et al.  Application of a Bayesian ordinal animal model for the estimation of breeding values for the resistance to Monilinia fruticola (G.Winter) Honey in progenies of peach [Prunus persica (L.) Batsch] , 2017, Breeding science.

[18]  B. Liu,et al.  Interlinked regulatory loops of ABA catabolism and biosynthesis coordinate fruit growth and ripening in woodland strawberry , 2018, Proceedings of the National Academy of Sciences.

[19]  Glenn S. Cole,et al.  Genome-Wide Association Mapping Uncovers Fw1, a Dominant Gene Conferring Resistance to Fusarium Wilt in Strawberry , 2018, G3: Genes, Genomes, Genetics.

[20]  M. Goddard,et al.  Genomic selection. , 2007, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[21]  E. Knaap,et al.  A common genetic mechanism underlies morphological diversity in fruits and other plant organs , 2018, Nature Communications.

[22]  Carlos A Manacorda,et al.  Arabidopsis phenotyping through geometric morphometrics , 2017, bioRxiv.

[23]  J. Cheverud,et al.  Quantitative genetics of skeletal nonmetric traits in the rhesus macaques on Cayo Santiago. II. Phenotypic, genetic, and environmental correlations between traits. , 1981, American journal of physical anthropology.

[24]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[25]  Przemyslaw Prusinkiewicz,et al.  Latent Space Phenotyping: Automatic Image-Based Phenotyping for Treatment Studies , 2019, bioRxiv.

[26]  D. Chitwood,et al.  Cc-by-nc-nd 4.0 International License , 2022 .

[27]  Jorge Victorino,et al.  Contour analysis for interpretable leaf shape category discovery , 2019, Plant Methods.

[28]  Christopher N Topp,et al.  Revealing plant cryptotypes: defining meaningful phenotypes among infinite traits. , 2015, Current opinion in plant biology.

[29]  A. Iezzoni,et al.  Large-Scale Standardized Phenotyping of Strawberry in RosBREED , 2013 .

[30]  H. J. Kim,et al.  Mapping of two suppressors of OVATE (sov) loci in tomato , 2013, Heredity.

[31]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[32]  L. Antanaviciute Genetic mapping and phenotyping plant characteristics, fruit quality and disease resistance traits in octoploid strawberry (Fragaria × ananassa) , 2016 .

[33]  T. A. Martin,et al.  Accuracy of Genomic Selection Methods in a Standard Data Set of Loblolly Pine (Pinus taeda L.) , 2012, Genetics.

[34]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[35]  J. Gower Generalized procrustes analysis , 1975 .

[36]  Mao Li,et al.  The Persistent Homology Mathematical Framework Provides Enhanced Genotype-to-Phenotype Associations for Plant Morphology1[OPEN] , 2018, Plant Physiology.

[37]  Takeshi Hayashi,et al.  Genomic Prediction of Biological Shape: Elliptic Fourier Analysis and Kernel Partial Least Squares (PLS) Regression Applied to Grain Shape Prediction in Rice (Oryza sativa L.) , 2015, PloS one.

[38]  A. Moing,et al.  Genetic dissection of fruit quality traits in the octoploid cultivated strawberry highlights the role of homoeo-QTL in their control , 2012, Theoretical and Applied Genetics.

[39]  Jeroen Ooms,et al.  Advanced Graphics and Image-Processing in R [R package magick version 2.4.0] , 2020 .

[40]  Cédric Gaucherel,et al.  Momocs: Outline Analysis Using R , 2014 .

[41]  Guifang Fu,et al.  A statistical model for mapping morphological shape , 2010, Theoretical Biology and Medical Modelling.

[42]  Construction of a dense SNP map of a highly heterozygous diploid potato population and QTL analysis of tuber shape and eye depth , 2014, Theoretical and Applied Genetics.

[43]  Xianzhong Feng,et al.  Evolution through genetically controlled allometry space. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Sofia Visa,et al.  Modeling of tomato fruits into nine shape categories using elliptic fourier shape modeling and Bayesian classification of contour morphometric data , 2014, Euphytica.

[45]  S. Myles,et al.  Genome to Phenome Mapping in Apple Using Historical Data , 2016, The plant genome.

[46]  Nathan D. Miller,et al.  An Automated Image Analysis Pipeline Enables Genetic Studies of Shoot and Root Morphology in Carrot (Daucus carota L.) , 2018, Front. Plant Sci..

[47]  A. Agresti Analysis of Ordinal Categorical Data: Agresti/Analysis , 2010 .

[48]  W. Ewens Genetics and analysis of quantitative traits , 1999 .

[49]  Wendy Moncur,et al.  The Accuracy and Reliability of Crowdsource Annotations of Digital Retinal Images , 2016, Translational vision science & technology.

[50]  C. Klingenberg,et al.  QUANTITATIVE GENETICS OF GEOMETRIC SHAPE IN THE MOUSE MANDIBLE , 2001, Evolution; international journal of organic evolution.

[51]  Glenn S. Cole,et al.  Domestication of Temperate and Coastal Hybrids with Distinct Ancestral Gene Selection in Octoploid Strawberry , 2018, The plant genome.

[52]  K. Lewers,et al.  Antioxidant Capacity and Flavonoid Content in Wild Strawberries , 2007 .

[53]  S. Tanksley The Genetic, Developmental, and Molecular Bases of Fruit Size and Shape Variation in Tomato , 2004, The Plant Cell Online.

[54]  Graham W. Horgan,et al.  Use of statistical image analysis to discriminate carrot cultivars , 2001 .

[55]  Kevin M. Stoffel,et al.  Quantitative Trait Loci Controlling Fruit Size and Other Horticultural Traits in Bell Pepper (Capsicum annuum) , 2018, The plant genome.

[56]  José Luis Micol,et al.  Mutational spaces for leaf shape and size , 2008, HFSP journal.

[57]  Byoung-Cheorl Kang,et al.  An ultra-high-density bin map facilitates high-throughput QTL mapping of horticultural traits in pepper (Capsicum annuum) , 2016, DNA research : an international journal for rapid publication of reports on genes and genomes.

[58]  E. Knaap,et al.  Genome organization of the tomato sun locus and characterization of the unusual retrotransposon Rider. , 2009, The Plant journal : for cell and molecular biology.

[59]  R. Lande,et al.  Efficiency of marker-assisted selection in the improvement of quantitative traits. , 1990, Genetics.

[60]  K. Eskridge,et al.  Genomic-Enabled Prediction of Ordinal Data with Bayesian Logistic Ordinal Regression , 2015, G3: Genes, Genomes, Genetics.

[61]  P. McCullagh Analysis of Ordinal Categorical Data , 1985 .

[62]  M. Battino,et al.  Increasing Strawberry Fruit Sensorial and Nutritional Quality Using Wild and Cultivated Germplasm , 2012, PloS one.

[63]  A. Monforte,et al.  The genetic basis of fruit morphology in horticultural crops: lessons from tomato and melon. , 2013, Journal of experimental botany.

[64]  L. Brewer,et al.  Heritability of fruit shape in pears , 2000, Euphytica.

[65]  Satish Kumar,et al.  Marker-trait associations and genomic predictions of interspecific pear (Pyrus) fruit characteristics , 2019, Scientific Reports.

[66]  E. Baldwin,et al.  Historical Trends in Strawberry Fruit Quality Revealed by a Trial of University of Florida Cultivars and Advanced Selections , 2011 .

[67]  Yves Rosseel,et al.  lavaan: An R Package for Structural Equation Modeling , 2012 .

[68]  A. Fernie,et al.  Genetic diversity of strawberry germplasm using metabolomic biomarkers , 2018, Scientific Reports.

[69]  The workloads of farmers who sort and pack strawberries in accordance with standards of shipment and their awareness of standards of shipment. , 1989 .

[70]  Graham W. Horgan,et al.  The statistical analysis of plant part appearance — a review , 2001 .

[71]  David J Hearn Shape analysis for the automated identification of plants from images of leaves. , 2009 .

[72]  B. J. Hayes,et al.  Genomic selection: Genomic selection , 2007 .

[73]  P. Carneiro,et al.  New Proposals to Estimate Unbiased Selection Gain and Coefficient of Variation in Traits Evaluated Using Score Scales , 2019, Crop Science.

[74]  T. Hasing,et al.  Estimation of Genetic Parameters for 12 Fruit and Vegetative Traits in the University of Florida Strawberry Breeding Population , 2012 .

[75]  Michael F. Covington,et al.  A Modern Ampelography: A Genetic Basis for Leaf Shape and Venation Patterning in Grape1[C][W][OPEN] , 2013, Plant Physiology.

[76]  Ning Jiang,et al.  Morphological Variation of Tomato Fruit A Retrotransposon-Mediated Gene Duplication Underlies , 2014 .

[77]  Julien Claude,et al.  Morphometrics with R , 2009 .

[78]  L Sirovich,et al.  Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[79]  G. Darrow The strawberry : history, breeding and physiology , 1966 .

[80]  James B. Schreiber,et al.  Reporting Structural Equation Modeling and Confirmatory Factor Analysis Results: A Review , 2006 .

[81]  S. Tanksley,et al.  A new class of regulatory genes underlying the cause of pear-shaped tomato fruit , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[82]  Keisuke Nonaka,et al.  Genome-wide association study and genomic prediction in citrus: Potential of genomics-assisted breeding for fruit quality traits , 2017, Scientific Reports.

[83]  A. Aharoni,et al.  Gain and Loss of Fruit Flavor Compounds Produced by Wild and Cultivated Strawberry Species , 2004, The Plant Cell Online.

[84]  Jeffrey P. Mower,et al.  Origin and evolution of the octoploid strawberry genome , 2019, Nature Genetics.

[85]  C. Granier,et al.  Phenotyping and beyond: modelling the relationships between traits. , 2014, Current opinion in plant biology.

[86]  R. Bernardo,et al.  Germplasm Architecture Revealed through Chromosomal Effects for Quantitative Traits in Maize , 2016, The plant genome.

[87]  M. Sillanpää,et al.  Dynamic Quantitative Trait Locus Analysis of Plant Phenomic Data. , 2015, Trends in plant science.

[88]  Jean-Michel Poggi,et al.  VSURF: An R Package for Variable Selection Using Random Forests , 2015, R J..

[89]  Abhijit Ghatak,et al.  Deep Learning with R , 2019, Springer Singapore.

[90]  I. Dan,et al.  CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE LEARNING , 2018, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences.

[91]  Johannes E. Schindelin,et al.  Fiji: an open-source platform for biological-image analysis , 2012, Nature Methods.

[92]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[93]  Kevin W Eliceiri,et al.  NIH Image to ImageJ: 25 years of image analysis , 2012, Nature Methods.

[94]  Mathilde A. F. Balduzzi,et al.  Reshaping Plant Biology: Qualitative and Quantitative Descriptors for Plant Morphology , 2017, Front. Plant Sci..

[95]  G. Evanno,et al.  Detecting the number of clusters of individuals using the software structure: a simulation study , 2005, Molecular ecology.