Element selection and concentration analysis for classifying South America wine samples according to the country of origin

Abstract This paper proposes an approach for feature selection aimed at classifying wines samples according to place of origin. The method relies on Kruskal-Wallis non-parametric test to remove non significant features, and Linear Discriminant Analysis to derive a feature importance index. The ranked features according that index are iteratively added and classification performance is assessed after each insertion. The number of selected features is chosen according the maximum accuracy in a repeated 10-fold cross-validation. Aiming at improving categorization accuracy, different classification techniques are tested. When applied to a wine dataset comprised of 53 samples from four South America countries (Argentina, Brazil, Chile, and Uruguay) and 45 chemical elements concentrations determined by ICP-OES and ICP-MS, the proposed framework yielded average 99.9% accurate classifications in the testing set, and retained average 6.73 of the 45 original elements. Retained chemical elements were then qualitatively assessed.

[1]  Rommel M. Barbosa,et al.  Classification of geographic origin of rice by data mining and inductively coupled plasma mass spectrometry , 2016, Comput. Electron. Agric..

[2]  B. Palm,et al.  High-level exposure to lithium, boron, cesium, and arsenic via drinking water in the Andes of northern Argentina. , 2010, Environmental science & technology.

[3]  Josse De Baerdemaeker,et al.  A review of the analytical methods coupled with chemometric tools for the determination of the quality and identity of dairy products , 2007 .

[4]  M. de la Guardia,et al.  Elemental fingerprint of wines from the protected designation of origin Valencia. , 2009 .

[5]  S. V. Dutra,et al.  Characterization of wines according the geographical origin by analysis of isotopes and minerals and the influence of harvest on the isotope values. , 2013, Food chemistry.

[6]  Silvana Gómez-Meire,et al.  Assuring the authenticity of northwest Spain white wine varieties using machine learning techniques , 2014 .

[7]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[8]  Graeme D. Ruxton,et al.  Some suggestions about appropriate use of the Kruskal–Wallis test , 2008, Animal Behaviour.

[9]  Mohamed Limam,et al.  A hybrid feature selection method based on instance learning and cooperative subset search , 2016, Pattern Recognit. Lett..

[10]  E Theodorsson-Norheim,et al.  Kruskal-Wallis test: BASIC computer program to perform nonparametric one-way analysis of variance and multiple comparisons on ranks of several independent samples. , 1986, Computer methods and programs in biomedicine.

[11]  M. Ferrão,et al.  Methods of multivariate analysis of NIR reflectance spectra for classification of yerba mate , 2014 .

[12]  Li Zhang,et al.  Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks , 2014, Expert Syst. Appl..

[13]  Constantinos A. Georgiou,et al.  Multi-element and multi-isotope-ratio analysis to determine the geographical origin of foods in the European Union , 2012 .

[14]  Andrew Fisher,et al.  The classification of tea according to region of origin using pattern recognition techniques and trace metal data , 2003 .

[15]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[16]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[17]  Luis D. Martinez,et al.  Classification of monovarietal Argentinean white wines by their elemental profile , 2015 .

[18]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[19]  Yuh-Jye Lee,et al.  SSVM: A Smooth Support Vector Machine for Classification , 2001, Comput. Optim. Appl..

[20]  Bruno Lemos Batista,et al.  The use of decision trees and naïve Bayes algorithms and trace element patterns for controlling the authenticity of free-range-pastured hens' eggs. , 2014, Journal of food science.

[21]  S. Sathiya Keerthi,et al.  A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[22]  F. Vanhaecke,et al.  Intraregional classification of wine via ICP-MS elemental fingerprinting. , 2014, Food chemistry.

[23]  W. Art Chaovalitwongse,et al.  Multicriteria variable selection for classification of production batches , 2012, Eur. J. Oper. Res..

[24]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[25]  Dirce Pozebon,et al.  Elemental Analysis of Wines from South America and their Classification According to Country , 2011 .

[26]  Minrui Fei,et al.  A novel forward gene selection algorithm for microarray data , 2014, Neurocomputing.

[27]  Bo Zhang,et al.  Near infrared reflectance spectroscopy for determination of the geographical origin of wheat. , 2013, Food chemistry.

[28]  F. Barbosa,et al.  The use of advanced chemometric techniques and trace element levels for controlling the authenticity of organic coffee , 2014 .

[29]  Hildegarde Heymann,et al.  Profiling the trace metal composition of wine as a function of storage temperature and packaging type , 2013 .

[30]  Andrea D. Magrì,et al.  Authentication of Italian CDO wines by class-modeling techniques , 2006 .

[31]  A. A. Gomes,et al.  Simultaneous Classification of Teas According to Their Varieties and Geographical Origins by Using NIR Spectroscopy and SPA-LDA , 2014, Food Analytical Methods.

[32]  Ricard Boqué,et al.  Data fusion methodologies for food and beverage authentication and quality assessment - a review. , 2015, Analytica chimica acta.