Chemometrics in foodomics: Handling data structures from multiple analytical platforms

Abstract Foodomics studies are normally concerned with multifactorial problems and it makes good sense to explore and to measure the same samples on complementary, synergistic analytical platforms that comprise multifactorial sensors and separation methods. However, the challenge of exploring, extracting and describing the data increases exponentially. Moreover, the risk of becoming flooded with non-informative data increases concomitantly. Acquisition of data from different analytical platforms provides opportunities for checking the validity of the data, comparing analytical platforms and ensuring proper data (pre)processing – all in the context of correlation studies. We provide practical and pragmatic tools to validate and to deal advantageously with data from more than one analytical platform. We emphasize the need for complementary correlation studies within and between blocks of data to ensure proper data handling, interpretation and dissemination. Correlation studies are a preliminary step prior to multivariate data analysis or as an introduction to more advanced multi-block methods.

[1]  Laurent Debrauwer,et al.  Selection of biomarkers by a multivariate statistical processing of composite metabonomic data sets using multiple factor analysis. , 2005, Journal of proteome research.

[2]  Isao Noda,et al.  Generalized correlation NMR spectroscopy. , 2002, Journal of the American Chemical Society.

[3]  Steffen Neumann,et al.  Critical assessment of alignment procedures for LC-MS proteomics and metabolomics measurements , 2008, BMC Bioinformatics.

[4]  I. Noda Generalized Two-Dimensional Correlation Method Applicable to Infrared, Raman, and other Types of Spectroscopy , 1993 .

[5]  Cristina Ruiz-Romero,et al.  Osteoarthritis: Metabolomic characterization of metabolic phenotypes in OA , 2012, Nature Reviews Rheumatology.

[6]  Ralf Steuer,et al.  Review: On the analysis and interpretation of correlations in metabolomic data , 2006, Briefings Bioinform..

[7]  El Mostafa Qannari,et al.  Model validation and error estimation in multi-block partial least squares regression , 2012 .

[8]  A. Smilde,et al.  Deflation in multiblock PLS , 2001 .

[9]  Thomas Skov,et al.  Chemometrics, Mass Spectrometry, and Foodomics , 2013 .

[10]  Arjen Lommen,et al.  MetAlign: interface-driven, versatile metabolomics tool for hyphenated full-scan mass spectrometry data preprocessing. , 2009, Analytical chemistry.

[11]  Matej Oresic,et al.  Normalization method for metabolomics data using optimal selection of multiple internal standards , 2007, BMC Bioinformatics.

[12]  R. J. O. Torgrip,et al.  A note on normalization of biofluid 1D 1H-NMR data , 2008, Metabolomics.

[13]  R. Bro,et al.  Multiblock variance partitioning: a new approach for comparing variation in multiple data blocks. , 2008, Analytica chimica acta.

[14]  P. Mendes,et al.  The origin of correlations in metabolomics data , 2005, Metabolomics.

[15]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[16]  Laxman Yetukuri,et al.  Algorithms and tools for the preprocessing of LC–MS metabolomics data , 2011 .

[17]  J. W. Allwood,et al.  1H NMR, GC-EI-TOFMS, and data set correlation for fruit metabolomics: application to spatial metabolite analysis in melon. , 2009, Analytical chemistry.

[18]  G. Siuzdak,et al.  XCMS Online: a web-based platform to process untargeted metabolomic data. , 2012, Analytical chemistry.

[19]  Timothy M. D. Ebbels,et al.  Intra- and inter-omic fusion of metabolic profiling data in a systems biology framework , 2010 .

[20]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.

[21]  Tormod Næs,et al.  Multi-block regression based on combinations of orthogonalisation, PLS-regression and canonical correlation analysis , 2013 .

[22]  Rasmus Bro,et al.  Generalized correlation loadings: Extending correlation loadings to congruence and to multi-way models , 2006 .

[23]  Robert S Plumb,et al.  Statistical heterospectroscopy, an approach to the integrated analysis of NMR and UPLC-MS data sets: application in metabonomic toxicology studies. , 2006, Analytical chemistry.

[24]  S. Lê,et al.  BMC Genomics BioMed Central Methodology article Simultaneous analysis of distinct Omics data sets with integration of biological knowledge: Multiple Factor Analysis approach , 2008 .

[25]  Matej Oresic,et al.  MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data , 2006, Bioinform..

[26]  H. Senn,et al.  Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. , 2006, Analytical chemistry.

[27]  Søren Balling Engelsen,et al.  Comparative vibrational spectroscopy for determination of quality parameters in amidated pectins as evaluated by chemometrics , 1996 .

[28]  T. Skov,et al.  The Effect of LC-MS Data Preprocessing Methods on the Selection of Plasma Biomarkers in Fed vs. Fasted Rats , 2012, Metabolites.

[29]  Ines Thiele,et al.  Intracellular metabolite profiling of platelets: evaluation of extraction processes and chromatographic strategies. , 2012, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[30]  A. Cifuentes Food analysis and foodomics. , 2009, Journal of chromatography. A.

[31]  Stéphanie Bougeard,et al.  Multiblock redundancy analysis: interpretation tools and application in epidemiology , 2011 .

[32]  E. Want,et al.  Liquid chromatography-mass spectrometry based global metabolite profiling: a review. , 2012, Analytica chimica acta.

[33]  El Mostafa Qannari,et al.  Analysis of -omics data: Graphical interpretation- and validation tools in multi-block methods , 2010 .

[34]  Sven P. Jacobsson,et al.  Evaluation of different techniques for data fusion of LC/MS and 1H-NMR , 2007 .

[35]  Elizabeth Want,et al.  Challenges in applying chemometrics to LC-MS-based global metabolite profile data. , 2009, Bioanalysis.

[36]  J. Macgregor,et al.  Analysis of multiblock and hierarchical PCA and PLS models , 1998 .

[37]  L. De Lathauwer,et al.  DISCO-SCA and Properly Applied GSVD as Swinging Methods to Find Common and Distinctive Processes , 2012, PloS one.

[38]  Johan A. Westerhuis,et al.  Canonical correlation analysis of multiple sensory directed metabolomics data blocks reveals corresponding parts between data blocks , 2011 .

[39]  M. Mitreva,et al.  Alpha-gliadin genes from the A, B, and D genomes of wheat contain different sets of celiac disease epitopes , 2006, BMC Genomics.

[40]  Arjen Lommen,et al.  MetAlign 3.0: performance enhancement by efficient use of advances in computer hardware , 2011, Metabolomics.

[41]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[42]  Tormod Næs,et al.  Regression models with process variables and parallel blocks of raw material measurements , 2008 .

[43]  A. Smilde,et al.  Fusion of mass spectrometry-based metabolomics data. , 2005, Analytical chemistry.

[44]  M. Orešič,et al.  Data processing for mass spectrometry-based metabolomics. , 2007, Journal of chromatography. A.

[45]  H. Martens,et al.  Analysis of designed experiments by stabilised PLS Regression and jack-knifing , 2001 .

[46]  Francesco Savorani,et al.  Assessment of the Effect of High or Low Protein Diet on the Human Urine Metabolome as Measured by NMR , 2012, Nutrients.

[47]  E. Anderson Hudson et al. , 1977 .

[48]  D. Gauguier,et al.  Statistical total correlation spectroscopy: an exploratory approach for latent biomarker identification from metabolic 1H NMR data sets. , 2005, Analytical chemistry.

[49]  Morten Arendt Rasmussen,et al.  A primer to nutritional metabolomics by NMR spectroscopy and chemometrics , 2013 .

[50]  R. Bro Multiway calibration. Multilinear PLS , 1996 .

[51]  E. Ibáñez,et al.  Present and future challenges in food analysis: foodomics. , 2012, Analytical chemistry.

[52]  Catalin C. Barbacioru,et al.  The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies , 2008, BMC Bioinformatics.