Discovery of food identity markers by metabolomics and machine learning technology

Verification of food authenticity establishes consumer trust in food ingredients and components of processed food. Next to genetic or protein markers, chemicals are unique identifiers of food components. Non-targeted metabolomics is ideally suited to screen food markers when coupled to efficient data analysis. This study explored feasibility of random forest (RF) machine learning, specifically its inherent feature extraction for non-targeted metabolic marker discovery. The distinction of chia, linseed, and sesame that have gained attention as “superfoods” served as test case. Chemical fractions of non-processed seeds and of wheat cookies with seed ingredients were profiled. RF technology classified original seeds unambiguously but appeared overdesigned for material with unique secondary metabolites, like sesamol or rosmarinic acid in the Lamiaceae, chia. Most unique metabolites were diluted or lost during cookie production but RF technology classified the presence of the seed ingredients in cookies with 6.7% overall error and revealed food processing markers, like 4-hydroxybenzaldehyde for chia and succinic acid monomethylester for linseed additions. RF based feature extraction was adequate for difficult classifications but marker selection should not be without human supervision. Combination with alternative data analysis technologies is advised and further testing of a wide range of seeds and food processing methods.

[1]  J. Kopka,et al.  Profiling methods to identify cold-regulated primary metabolites using gas chromatography coupled to mass spectrometry. , 2014, Methods in molecular biology.

[2]  Michele Suman,et al.  The scientific challenges in moving from targeted to non-targeted mass spectrometric methods for food fraud analysis: A proposed validation workflow to bring about a harmonized approach , 2018, Trends in Food Science & Technology.

[3]  Wayne Coates,et al.  Influence of environment on growing period and yield, protein, oil and α-linolenic content of three chia (Salvia hispanica L.) selections , 2009 .

[4]  S. Nolasco,et al.  Physical properties of chia (Salvia hispanica L.) seeds , 2008 .

[5]  Stephen L. DeFelice,et al.  The nutraceutical revolution: its impact on food industry R&D , 1995 .

[6]  Jan Hummel,et al.  Retention index thresholds for compound matching in GC-MS metabolite profiling. , 2008, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[7]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[8]  W. Coates,et al.  Protein content, oil content and fatty acid profiles as potential criteria to determine the origin of commercially grown chia (Salvia hispanica L.) , 2011 .

[9]  R. Mensink,et al.  Effects of superfoods on risk factors of metabolic syndrome: a systematic review of human intervention trials. , 2018, Food & function.

[10]  M. Capitani,et al.  Physicochemical and functional characterization of by-products from chia (Salvia hispanica L.) seeds of Argentina , 2012 .

[11]  Simon D. Kelly,et al.  Tracing the geographical origin of food: The application of multi-element and multi-isotope analysis , 2005 .

[12]  Alexander Erban,et al.  TagFinder for the quantitative analysis of gas chromatography - mass spectrometry (GC-MS)-based metabolite profiling experiments , 2008, Bioinform..

[13]  R. Ayerza Seed composition of two chia (Salvia hispanica L.) genotypes which differ in seed color. , 2013 .

[14]  Serge Rudaz,et al.  Plant metabolomics: from holistic data to relevant biomarkers. , 2013, Current medicinal chemistry.

[15]  Joachim Selbig,et al.  Decision tree supported substructure prediction of metabolites from GC-MS profiles , 2010, Metabolomics.

[16]  P. Åman,et al.  Phenolic glucosides in bread containing flaxseed. , 2008, Food chemistry.

[17]  M. de la Guardia,et al.  Food protected designation of origin : methodologies and applications , 2013 .

[18]  A. Fernie,et al.  Acquisition of Volatile Compounds by Gas Chromatography-Mass Spectrometry (GC-MS). , 2018, Methods in molecular biology.

[19]  P. Fratzl,et al.  Macromolecular recognition directs calcium ions to coccolith mineralization sites , 2016, Science.

[20]  Robert C. Wolpert,et al.  A Review of the , 1985 .

[21]  I. Arts,et al.  Lignan contents of Dutch plant foods: a database including lariciresinol, pinoresinol, secoisolariciresinol and matairesinol , 2005, British Journal of Nutrition.

[22]  T. García,et al.  A review of current PCR-based methodologies for the authentication of meats from game animal species. , 2010 .

[23]  Pieter Giesbertz,et al.  Nutrimetabolomics: An Integrative Action for Metabolomic Analyses in Human Nutritional Studies. , 2018, Molecular nutrition & food research.

[24]  W. Liang,et al.  TM4 microarray software suite. , 2006, Methods in enzymology.

[25]  Carlos Miralbés,et al.  Discrimination of European wheat varieties using near infrared reflectance spectroscopy , 2008 .

[26]  E. Etxeberria,et al.  Metabolomic analysis in food science: a review , 2009 .

[27]  Philipp Probst,et al.  Hyperparameters and tuning strategies for random forest , 2018, WIREs Data Mining Knowl. Discov..

[28]  Y. Gibon,et al.  Fortune telling: metabolic markers of plant performance , 2016, Metabolomics.

[29]  E. Llorent-Martínez,et al.  Characterization and comparison of the chemical composition of exotic superfoods , 2013 .

[30]  Y. Chang,et al.  Evaluation of whole chia (Salvia hispanica L.) flour and hydrogenated vegetable fat in pound cake , 2013 .

[31]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[32]  D. Betancur-Ancona,et al.  Physicochemical characterization of chia (Salvia hispanica) seed oil from Yucatán, México , 2014 .

[33]  W. Liang,et al.  9) TM4 Microarray Software Suite , 2006 .

[34]  Yves Gibon,et al.  GMD@CSB.DB: the Golm Metabolome Database , 2005, Bioinform..

[35]  J. Riedl,et al.  Review of validation and reporting of non-targeted fingerprinting approaches for food authentication. , 2015, Analytica chimica acta.

[36]  Derek Stewart,et al.  Phytochemical diversity in tubers of potato cultivars and landraces using a GC-MS metabolomics approach. , 2008, Journal of agricultural and food chemistry.

[37]  Devanand L. Luthria,et al.  Discriminating between cultivars and treatments of broccoli using mass spectral fingerprinting and analysis of variance-principal component analysis. , 2008, Journal of agricultural and food chemistry.

[38]  Joost T. van Dongen,et al.  Discovering plant metabolic biomarkers for phenotype prediction using an untargeted approach. , 2010, Plant biotechnology journal.

[39]  L. Frewer,et al.  Food fraud and the perceived integrity of European food imports into China , 2018, PloS one.

[40]  J. Kopka,et al.  Search for Transcriptional and Metabolic Markers of Grape Pre-Ripening and Ripening and Insights into Specific Aroma Development in Three Portuguese Cultivars , 2013, PloS one.

[41]  J. Tzen,et al.  Identification of methanol-soluble compounds in sesame and evaluation of antioxidant potential of its lignans. , 2011, Journal of agricultural and food chemistry.

[42]  P. Ribotta,et al.  Chia (Salvia hispanica L.) oil extraction: Study of processing parameters , 2012 .

[43]  T. García,et al.  Authenticity testing of wheat, barley, rye and oats in food and feed market samples by real-time PCR assays , 2015 .

[44]  O. Fiehn,et al.  Metabolite profiling for plant functional genomics , 2000, Nature Biotechnology.

[45]  A. Tecante,et al.  Dietary fibre content and antioxidant activity of phenolic compounds present in Mexican chia (Salvia hispanica L.) seeds , 2008 .

[46]  Anja Thalhammer,et al.  Metabolite and transcript markers for the prediction of potato drought tolerance , 2017, Plant biotechnology journal.

[47]  Joachim Kopka,et al.  Nonsupervised construction and application of mass spectral and retention time index libraries from time-of-flight gas chromatography-mass spectrometry metabolite profiles. , 2007, Methods in molecular biology.

[48]  Dirk Walther,et al.  Mass spectral search and analysis using the Golm Metabolome Database , 2012 .

[49]  Royston Goodacre,et al.  Inter-laboratory reproducibility of fast gas chromatography–electron impact–time of flight mass spectrometry (GC–EI–TOF/MS) based plant metabolomics , 2009, Metabolomics.

[50]  Joana Costa,et al.  Advances in vegetable oil authentication by DNA-based markers , 2012 .

[51]  S. Sforza Food authentication using bioorganic molecules. , 2013 .

[52]  Michele Silveira Coelho,et al.  Effects of substituting chia (Salvia hispanica L.) flour or seeds for wheat flour on the quality of the bread , 2015 .

[53]  Franco Biasioli,et al.  PTR-ToF-MS and data mining methods: a new tool for fruit metabolomics , 2012, Metabolomics.

[54]  Ute Roessner,et al.  Simultaneous analysis of metabolites in potato tuber by gas chromatography-mass spectrometry. , 2000 .