Investigating the use of gradient boosting machine, random forest and their ensemble to predict skin flavonoid content from berry physical-mechanical characteristics in wine grapes

Texture properties and flavonoids content were measured in 22 wine-grape cultivars.Regression trees, random forests and gradient boosting machine were applied to data.Variance explained by those models was very high in external and internal validation.Model performances will be further increased by cultivar-specific calibrations. Flavonoids are a class of bioactive compounds largely represented in grapevine and wine. They also affect the sensory quality of fruits and vegetables, and derived products. Methods available for flavonoid measurement are time-consuming, thus a rapid and cost-effective determination of these compounds is an important research objective. This work tests if applying machine learning techniques to texture analysis data allows to reach good performances for flavonoid estimation in grape berries.Whole berry and skin texture analysis was applied to berries from 22 red wine grape cultivars and linked to the total flavonoid content. Three machine-learning techniques (regression tree, random forest and gradient boosting machine) were then applied. Models reached a high accuracy both in the external and internal validation. The R2 ranged from 0.75 to 0.85 for the external validation and from 0.65 to 0.75 for the internal validation, while RMSE (Root Mean Square Error) went from 0.95mgg-1 to 0.7mgg-1 in the external validation and from 1.3mgg-1 to 1.1mgg-1 in the internal validation.

[1]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[2]  Simone Giacosa,et al.  Possible use of texture characteristics of winegrapes as markers for zoning and their relationship with anthocyanin extractability index , 2011 .

[3]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[4]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[5]  L. Rolle,et al.  Mechanical Behavior of Winegrapes under Compression Tests , 2008, American Journal of Enology and Viticulture.

[6]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[7]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[8]  Simone Giacosa,et al.  Influence of grape density and harvest date on changes in phenolic composition, phenol extractability indices, and instrumental texture properties during ripening. , 2011, Journal of agricultural and food chemistry.

[9]  Rich Caruana,et al.  Getting the Most Out of Ensemble Selection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10]  Stefanos Koundouras,et al.  Influence of vineyard location and vine water status on fruit maturation of nonirrigated cv. Agiorgitiko (Vitis vinifera L.). Effects on wine phenolic and aroma components. , 2006, Journal of agricultural and food chemistry.

[11]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[12]  Shao-Hua Li,et al.  Berry ripening: recently heard through the grapevine. , 2013, Journal of experimental botany.

[13]  A. Vianello,et al.  Transport and accumulation of flavonoids in grapevine (Vitis vinifera L.) , 2008, Plant signaling & behavior.

[14]  Simone Giacosa,et al.  Berry skin thickness as main texture parameter to predict anthocyanin extractability in winegrapes , 2011 .

[15]  B. Ripley,et al.  Recursive Partitioning and Regression Trees , 2015 .

[16]  F. Tomás-Barberán,et al.  Flavonoids in Food and Their Health Benefits , 2004, Plant foods for human nutrition.

[17]  Beńed́icte Lorrain,et al.  Evolution of Analysis of Polyhenols from Grapes, Wines, and Extracts , 2013, Molecules.

[18]  Matthias Schmid,et al.  Applying additive modelling and gradient boosting to assess the effects of watershed and reach characteristics on riverine assemblages , 2012 .

[19]  Luca Rolle,et al.  Volatile fingerprint and physico-mechanical properties of ‘Muscat blanc’ grapes grown in mountain area: a first evidence of the influence of water regimes. , 2013 .

[20]  Simone Giacosa,et al.  Instrumental Texture Analysis Parameters as Winegrapes Varietal Markers and Ripeness Predictors , 2011 .

[21]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[22]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[23]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[24]  Alain Carbonneau,et al.  Phenolic potential of Tannat, Cabernet-Sauvignon and Merlot grapes and their correspondence with wine composition , 2004 .

[25]  Max Kuhn,et al.  caret: Classification and Regression Training , 2015 .

[26]  I. Razmilic,et al.  Apple Peel Supplemented Diet Reduces Parameters of Metabolic Syndrome and Atherogenic Progression in ApoE−/− Mice , 2015, Evidence-based complementary and alternative medicine : eCAM.

[27]  Simone Giacosa,et al.  Winegrape berry skin thickness determination: comparison between histological observation and texture analysis determination , 2015 .

[28]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[29]  Seema Bhagwat,et al.  Flavonoid content of U.S. fruits, vegetables, and nuts. , 2006, Journal of agricultural and food chemistry.

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[32]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[33]  Simone Giacosa,et al.  Use of instrumental acoustic parameters of winegrape seeds as possible predictors of extractable phenolic compounds. , 2013, Journal of agricultural and food chemistry.

[34]  Silvia Guidoni,et al.  Berry Size and Qualitative Characteristics of Vitis vinifera L. cv. Syrah , 2016 .

[35]  Leigh Francis,et al.  Flavonoids and C13-norisoprenoids in Vitis vinifera L. cv. Shiraz: relationships between grape and wine composition, wine colour and wine sensory properties. , 2010 .

[36]  Simone Giacosa,et al.  Rapid methods for the evaluation of total phenol content and extractability in intact grape seeds of Cabernet-Sauvignon: Instrumental mechanical properties and FT-NIR spectrum , 2012 .

[37]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[38]  João Laranjinha,et al.  LDL isolated from plasma-loaded red wine procyanidins resist lipid oxidation and tocopherol depletion. , 2008, Journal of agricultural and food chemistry.

[39]  Maria Liakata,et al.  Merits of random forests emerge in evaluation of chemometric classifiers by external validation. , 2013, Analytica chimica acta.

[40]  Irina Volf,et al.  A critical review of methods for characterisation of polyphenolic compounds in fruits and vegetables. , 2011, Food chemistry.

[41]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[42]  José Miguel Hernández-Hierro,et al.  Determination of phenolic compounds of grape skins during ripening by NIR spectroscopy , 2011 .

[43]  Daniel Cozzolino,et al.  The role of visible and infrared spectroscopy combined with chemometrics to measure phenolic compounds in grape and wine samples. , 2015, Molecules.

[44]  J. Delabar,et al.  Effects of red wine polyphenolic compounds on paraoxonase-1 and lectin-like oxidized low-density lipoprotein receptor-1 in hyperhomocysteinemic mice. , 2009, The Journal of nutritional biochemistry.

[45]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[46]  Luca Rolle,et al.  Influence of Wine-Grape Skin Hardness on the Kinetics of Anthocyanin Extraction , 2012 .

[47]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[48]  Luca Rolle,et al.  Impact of grapes heterogeneity according to sugar level on both physical and mechanical berries properties and their anthocyanins extractability at harvest. , 2013 .

[49]  Luca Rolle,et al.  Phenolic ripeness assessment of grape skin by texture analysis , 2008 .