Predictive geographical authentication of green tea with protected designation of origin using a random forest model

Abstract Reliable origin authentication methods are critical for protecting high-value food products with designated geographical origins. A total of 623 tea samples were collected from important green tea production regions around China from 2012 to 2016. A Random Forest model (RF) with 19 input predictors (e.g., δ13C, 24Mg, 85Rb, and 206Pb/207Pb) was developed. Our RF model not only discriminated Westlake Xihu Longjing green tea (XHLJ) from other regions with an accuracy of 97.6%, but also correctly identified green tea from surrounding regions with an accuracy of 97.9%. The geographical discrimination of tea subsequently harvested in the following years also showed good reliability. Predictive accuracies were higher than 91%. 85Rb, 24Mg, δ13C and 39K were the most important geographical proxies for determining geographical origin of tea with a relative contribution of 20.6%, 12.5%, 12.1% and 7.4%, respectively. This RF model showed higher classification accuracy than other commonly used chemometrics models and provides a new insight into the use of predictive models utilizing historical data for geographical authentication of agricultural products with Protected Designation of Origin (PDO).

[1]  K. H. Laursen,et al.  Multielemental fingerprinting as a tool for authentication of organic wheat, barley, faba bean, and potato. , 2011, Journal of agricultural and food chemistry.

[2]  F. Pablos,et al.  Differentiation of tea (Camellia sinensis) varieties and their geographical origin according to their metal content. , 2001, Journal of agricultural and food chemistry.

[3]  T. Karak,et al.  Comparative Assessment of Copper, Iron, and Zinc Contents in Selected Indian (Assam) and South African (Thohoyandou) Tea (Camellia sinensis L.) Samples and Their Infusion: A Quest for Health Risks to Consumer , 2016, Biological Trace Element Research.

[4]  Andrew Fisher,et al.  The classification of tea according to region of origin using pattern recognition techniques and trace metal data , 2003 .

[5]  K. Grice,et al.  Application of trace element and stable isotope signatures to determine the provenance of tea (Camellia sinensis) samples , 2010 .

[6]  H. Sigel,et al.  Magnesium in Plants : Uptake , Distribution , Function , and Utilization by Man and Animals , 2007 .

[7]  Joanna Szpunar,et al.  Discrimination of geographical origin of rice based on multi-element fingerprinting by high resolution inductively coupled plasma mass spectrometry. , 2013, Food chemistry.

[8]  Rommel M. Barbosa,et al.  Recognition of organic rice samples based on trace elements and support vector machines , 2016 .

[9]  Fahu Chen,et al.  Variation in the Stable Carbon and Nitrogen Isotope Composition of Plants and Soil along a Precipitation Gradient in Northern China , 2012, PloS one.

[10]  K. Rogers,et al.  Improved Discrimination for Brassica Vegetables Treated with Agricultural Fertilizers Using a Combined Chemometric Approach. , 2016, Journal of agricultural and food chemistry.

[11]  Shuming Yang,et al.  Recent developments in application of stable isotope analysis on agro-product authenticity and traceability. , 2014, Food chemistry.

[12]  J. Ruan,et al.  Multi-element composition and isotopic signatures for the geographical origin discrimination of green tea in China: A case study of Xihu Longjing , 2018 .

[13]  R. Siegwolf,et al.  Inter- and intra-annual stable carbon and oxygen isotope signals in response to drought in Mediterranean pines , 2013 .

[14]  Baofeng Di,et al.  Satellite-Based Estimates of Daily NO2 Exposure in China Using Hybrid Random Forest and Spatiotemporal Kriging Model. , 2018, Environmental science & technology.

[15]  D. L. García-González,et al.  Geographical traceability of virgin olive oils from south-western Spain by their multi-elemental composition. , 2015, Food chemistry.

[16]  Xiangfei Song,et al.  Geographical origin traceability of tea based on multi-element spatial distribution and the relationship with soil in district scale , 2018, Food Control.

[17]  Ned Horning,et al.  Random Forests : An algorithm for image classification and generation of continuous fields data sets , 2010 .

[18]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[19]  Rommel M. Barbosa,et al.  Comparative study of data mining techniques for the authentication of organic grape juice based on ICP-MS analysis , 2016, Expert Syst. Appl..

[20]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[21]  X. Yi,et al.  Effects of long-term nitrogen application on soil acidification and solution chemistry of a tea plantation in China , 2018 .

[22]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[23]  R. Dhakate,et al.  Assessment of trace elements in soils around Zaheerabad Town, Medak District, Andhra Pradesh, India , 2015, Environmental Earth Sciences.

[24]  Royston Goodacre,et al.  A comparison of different chemometrics approaches for the robust classification of electronic nose data , 2014, Analytical and Bioanalytical Chemistry.

[25]  P. Reich,et al.  Biogeography and variability of eleven mineral elements in plant leaves across gradients of climate, soil and plant functional type in China. , 2011, Ecology letters.

[26]  J. Ehleringer,et al.  Hydrogen and oxygen isotope ratios in human hair are related to geography , 2008, Proceedings of the National Academy of Sciences.

[27]  M. de la Guardia,et al.  Trace-element composition and stable-isotope ratio for discrimination of foods with Protected Designation of Origin , 2009 .

[28]  A. Sayago,et al.  Combination of complementary data mining methods for geographical characterization of extra virgin olive oils based on mineral composition. , 2018, Food chemistry.

[29]  G. Farquhar,et al.  Variation in the carbon and oxygen isotope composition of plant biomass and its relationship to water-use efficiency at the leaf- and ecosystem-scales in a northern Great Plains grassland. , 2014, Plant, cell & environment.

[30]  M. Wiesmeier,et al.  Digital mapping of soil organic matter stocks using Random Forest modeling in a semi-arid steppe ecosystem , 2011, Plant and Soil.

[31]  I.O. Osemwota,et al.  Distribution of magnesium forms in surface soils of Central Southern Nigeria , 2009 .

[32]  Andrea Versari,et al.  Progress in authentication, typification and traceability of grapes and wines by chemometric approaches , 2014 .

[33]  Zhi-Tian Zuo,et al.  Comprehensive quality assessment of Dendrubium officinale using ATR-FTIR spectroscopy combined with random forest and support vector machine regression. , 2018, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[34]  Constantinos A. Georgiou,et al.  Multi-element and multi-isotope-ratio analysis to determine the geographical origin of foods in the European Union , 2012 .

[35]  Jin Li,et al.  Spatial interpolation methods applied in the environmental sciences: A review , 2014, Environ. Model. Softw..

[36]  S. V. Dutra,et al.  Determination of the geographical origin of Brazilian wines by isotope and mineral analysis , 2011, Analytical and bioanalytical chemistry.

[37]  Federica Camin,et al.  Food authentication: Techniques, trends & emerging approaches , 2016 .

[38]  Chu Zhang,et al.  Mid-Infrared Spectroscopy for Coffee Variety Identification: Comparison of Pattern Recognition Methods , 2016 .

[39]  M. Anke,et al.  Rubidium in the food chain , 1995 .

[40]  I. Tea,et al.  Multi-element, multi-compound isotope profiling as a means to distinguish the geographical and varietal origin of fermented cocoa (Theobroma cacao L.) beans. , 2015, Food chemistry.

[41]  G. Pfister,et al.  Spatiotemporal prediction of fine particulate matter during the 2008 northern California wildfires using machine learning. , 2015, Environmental science & technology.

[42]  Z. Pan,et al.  Discrimination of oolong tea (Camellia sinensis) varieties based on feature extraction and selection from aromatic profiles analysed by HS-SPME/GC-MS. , 2013, Food chemistry.

[43]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[44]  K. Rogers,et al.  Geographical traceability of Chinese green tea using stable isotope and multi-element chemometrics. , 2019, Rapid communications in mass spectrometry : RCM.

[45]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[46]  Simon D. Kelly,et al.  Tracing the geographical origin of food: The application of multi-element and multi-isotope analysis , 2005 .

[47]  Shuangling Zhang,et al.  Relationship between multi-element composition in tea leaves and in provenance soils for geographical traceability , 2017 .

[48]  Xin Liu,et al.  Determining the geographical origin of Chinese green tea by linear discriminant analysis of trace metals and rare earth elements: Taking Dongting Biluochun as an example , 2016 .

[49]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[50]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[51]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[52]  K. Rogers,et al.  Assuring food safety and traceability of polished rice from different production regions in China and Southeast Asia using chemometric models , 2019, Food Control.

[53]  Peter Bühlmann,et al.  MissForest - non-parametric missing value imputation for mixed-type data , 2011, Bioinform..