Random forests: A machine learning methodology to highlight the volatile organic compounds involved in olfactory perception

Abstract The purpose of this paper is to discuss the application of the Random Forest methodology to sensory analysis. A methodological point of view is mainly adopted to describe as simply as possible the construction of binary decision trees and, more precisely, Classification and Regression Trees (CART), as well as the generation of an ensemble of trees or, in other words, a Random Forest. The interest of the permutation accuracy criterion, as a measure of variable importance, is specifically emphasized as a way of identifying the most predictive variables and selecting a subset of these variables for parsimonious and efficient predictive models. A two-step procedure is proposed for choosing this subset of variables. The principle of the method is illustrated in a case study in which the aim was to better understand and predict the olfactory characteristics of red wines made of the Cabernet Franc grape variety, from their Volatile Organic Compound (VOC) content. For two main olfactory attributes, the bell pepper odor and the leather odor, it was possible to list the most important compounds and to highlight a very small number of compounds useful for estimating each of the olfactory attributes considered. For the latter, it was also observed that Random Forest models had a better predictive ability than Partial Least Squares (PLS) Regression models.

[1]  Cesare Furlanello,et al.  Modern data mining tools in descriptive sensory analysis: A case study with a Random forest approach , 2007 .

[2]  Maohui Luo,et al.  Comparison on aroma compounds in Cabernet Sauvignon and Merlot wines from four wine grape-growing regions in China , 2013 .

[3]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[4]  Simone Giacosa,et al.  Investigating the use of gradient boosting machine, random forest and their ensemble to predict skin flavonoid content from berry physical-mechanical characteristics in wine grapes , 2015, Comput. Electron. Agric..

[5]  C. D. Page,et al.  Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle. , 2013, Journal of dairy science.

[6]  A. Reynolds,et al.  Effect of yeast strain on aroma compounds in Cabernet franc icewines , 2015 .

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[9]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[10]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[11]  U. Grömping Dependence of Variable Importance in Random Forests on the Shape of the Regressor Space , 2009 .

[12]  Jian Bi A REVIEW OF STATISTICAL METHODS FOR DETERMINATION OF RELATIVE IMPORTANCE OF CORRELATED PREDICTORS AND IDENTIFICATION OF DRIVERS OF CONSUMER LIKING , 2012 .

[13]  M. González-Viñas,et al.  Volatile and sensory characterization of red wines from cv. Moravia Agria minority grape variety cultivated in La Mancha region over five consecutive vintages , 2011 .

[14]  R. Symoneaux,et al.  Mixed Profiling: A new tool of sensory analysis in a professional context. Application to wines , 2017 .

[15]  Kellie J. Archer,et al.  Empirical characterization of random forest variable importance measures , 2008, Comput. Stat. Data Anal..

[16]  Rosaria Romano,et al.  Classification trees in consumer studies for combining both product attributes and consumer preferences with additional consumer characteristics , 2014 .

[17]  H. Meiselman,et al.  Propensity score analysis (PSA) for sensory causal inference – Global consumer psychographics and applications for phytonutrient supplements , 2016 .

[18]  M. Allen,et al.  Determination of Methoxypyrazines in Red Wines by Stable Isotope Dilution Gas Chromatography-Mass Spectrometry , 1994 .

[19]  Fabrizio Davide,et al.  Complex chemical pattern recognition with sensor array : the discrimination of vintage years of wine , 1995 .

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Silvana Gómez-Meire,et al.  Assuring the authenticity of northwest Spain white wine varieties using machine learning techniques , 2014 .

[22]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[23]  Susan E. Ebeler,et al.  ANALYTICAL CHEMISTRY: UNLOCKING THE SECRETS OF WINE FLAVOR , 2001 .

[24]  Jian Bi,et al.  IDENTIFICATION OF DRIVERS OF OVERALL LIKING – DETERMINATION OF RELATIVE IMPORTANCES OF REGRESSOR VARIABLES , 2011 .

[25]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[26]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[27]  J. O. Rawlings,et al.  Applied Regression Analysis: A Research Tool , 1988 .

[28]  Ryan Gosselin,et al.  A Bootstrap-VIP approach for selecting wavelength intervals in spectral imaging applications , 2010 .

[29]  A. Razungles,et al.  Influence of volatile thiols in the development of blackcurrant aroma in red wine. , 2014, Food chemistry.