From numbers to a biological sense: How the strategy chosen for metabolomics data treatment may affect final results. A practical example based on urine fingerprints obtained by LC‐MS

Application of high‐throughput technologies in metabolomics studies increases the quantity of data obtained, which in turn imposes several problems during data analysis. Correctly and clearly addressed biological question and comprehensive knowledge about data structure and properties are definitely necessary to select proper chemometric tools. However, there is a broad range of chemometric tools available for use with metabolomics data, which makes this choice challenging. Precisely performed data treatment enables valuable extraction of information and its proper interpretation. The effect of an error made at an early stage will be enhanced throughout the later stages, which in combination with other errors made at each step can accumulate and significantly affect the data interpretation. Moreover, adequate application of these tools may help not only to detect, but sometimes also to correct, biological, analytical, or methodological errors, which may affect truthfulness of obtained results. This report presents steps and tools used for LC‐MS based metabolomics data extraction, reduction, and visualization. Following such steps as data reprocessing, data pretreatment, data treatment, and data revision, authors want to show how to extract valuable information and how to avoid misinterpretation of results obtained. The purpose of this work was to emphasize problematic characteristics of metabolomics data and the necessity for their attentive and precise treatment. The dataset used to illustrate metabolomics data properties and to illustrate major data treatment challenges was obtained utilizing an animal model of control and diabetic rats, both with and without rosemary treatment. Urine samples were fingerprinted employing LC‐QTOF‐MS.

[1]  P H Baylis,et al.  Mechanisms responsible for thirst and polyuria associated with primary hyperaldosteronism. , 1987, British medical journal.

[2]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[3]  Douglas B. Kell,et al.  Statistical strategies for avoiding false discoveries in metabolomics and related experiments , 2007, Metabolomics.

[4]  Michel Verleysen,et al.  Comparison of some chemometric tools for metabonomics biomarker identification , 2008 .

[5]  B. Warrack,et al.  Normalization strategies for metabonomic analysis of urine samples. , 2009, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[6]  H. Lohninger,et al.  Validation of chemometric models for the determination of deoxynivalenol on maize by mid-infrared spectroscopy , 2008, Mycotoxin Research.

[7]  Kim D. Janda,et al.  Metabolomics-Based Discovery of Diagnostic Biomarkers for Onchocerciasis , 2010, PLoS neglected tropical diseases.

[8]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[9]  S. Wagner,et al.  Tools in metabonomics: an integrated validation approach for LC-MS metabolic profiling of mercapturic acids in human urine. , 2007, Analytical chemistry.

[10]  S. Wold,et al.  Orthogonal projections to latent structures (O‐PLS) , 2002 .

[11]  Johan Trygg,et al.  Chemometrics in metabonomics. , 2007, Journal of proteome research.

[12]  Coral Barbas,et al.  Metabolomics with LC-QTOF-MS Permits the Prediction of Disease Stage in Aortic Abdominal Aneurysm Based on Plasma Metabolic Fingerprint , 2012, PloS one.

[13]  T. Ebbels,et al.  Optimized preprocessing of ultra-performance liquid chromatography/mass spectrometry urinary metabolic profiles for improved information recovery. , 2011, Analytical chemistry.

[14]  Christian Baumgartner,et al.  Bioinformatic-driven search for metabolic biomarkers in disease , 2011, Journal of Clinical Bioinformatics.

[15]  Shinong Wang,et al.  Osmotic polyuria: an overlooked mechanism in diabetic nephropathy. , 2008, Nephrology, dialysis, transplantation : official publication of the European Dialysis and Transplant Association - European Renal Association.

[16]  Thomas Moritz,et al.  Metabolomic analysis of a human oral glucose tolerance test reveals fatty acids as reliable indicators of regulated metabolism , 2010, Metabolomics.

[17]  N. Baliga,et al.  metaXCMS: second-order analysis of untargeted metabolomics data. , 2011, Analytical chemistry.

[18]  J. Lindon,et al.  Metabonomics: a platform for studying drug toxicity and gene function , 2002, Nature Reviews Drug Discovery.

[19]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[20]  Wei Jia,et al.  Urine metabolite profiling offers potential early diagnosis of oral cancer , 2012, Metabolomics.

[21]  Johan Trygg,et al.  Multi- and Megavariate Data Analysis : Part II: Advanced Applications and Method Extensions , 2006 .

[22]  R A Pearson,et al.  SECTION I, SOCIAL AND ECONOMIC SCIENCE. , 1901, Science.

[23]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[24]  Joachim Selbig,et al.  A gentle guide to the analysis of metabolomic data. , 2007, Methods in molecular biology.

[25]  C. Barbas,et al.  Combination of LC-MS- and GC-MS-based metabolomics to study the effect of ozonated autohemotherapy on human blood. , 2012, Journal of proteome research.

[26]  Alejandro Cifuentes,et al.  Metabolomic approach with LC-QTOF to study the effect of a nutraceutical treatment on urine of diabetic rats. , 2011, Journal of proteome research.

[27]  I. Wilson,et al.  Evaluation of the repeatability of ultra-performance liquid chromatography-TOF-MS for global metabolic profiling of human urine samples. , 2008, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[28]  Jan van der Greef,et al.  Symbiosis of chemometrics and metabolomics: past, present, and future , 2005 .

[29]  Rafael Bailón-Moreno,et al.  Response Surface Methodology and its application in evaluating scientific activity , 2009, Scientometrics.

[30]  Johan Trygg,et al.  Integrated analysis of transcript, protein and metabolite data to study lignin biosynthesis in hybrid aspen. , 2009, Journal of proteome research.

[31]  Li Zhang,et al.  Data preprocessing method for liquid chromatography-mass spectrometry based metabolomics. , 2012, Analytical chemistry.

[32]  Oliver Fiehn,et al.  Metabolite profiling in blood plasma. , 2007, Methods in molecular biology.

[33]  M. Goligorsky,et al.  Diagnostic potential of urine proteome: a broken mirror of renal diseases. , 2007, Journal of the American Society of Nephrology : JASN.

[34]  A. Smilde,et al.  Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation. , 2006, Analytical chemistry.

[35]  D. R. Causton,et al.  The application of MANOVA to analyse Arabidopsis thaliana metabolomic data from factorially designed experiments , 2007, Metabolomics.

[36]  Johan Trygg,et al.  Chemometrics in metabolomics--a review in human disease diagnosis. , 2010, Analytica chimica acta.

[37]  Wolfram Weckwerth,et al.  Integrative profiling of metabolites and proteins: improving pattern recognition and biomarker selection for systems level approaches. , 2007, Methods in molecular biology.

[38]  M. Mitreva,et al.  Alpha-gliadin genes from the A, B, and D genomes of wheat contain different sets of celiac disease epitopes , 2006, BMC Genomics.

[39]  I. Wilson,et al.  Liquid chromatography and ultra-performance liquid chromatography-mass spectrometry fingerprinting of human urine: sample stability under different handling and storage conditions for metabonomics studies. , 2008, Journal of chromatography. A.

[40]  Oliver Fiehn,et al.  Mass-spectrometry-based metabolomics: limitations and recommendations for future progress with particular focus on nutrition research , 2009, Metabolomics.

[41]  Oliver Fiehn,et al.  Advances in structure elucidation of small molecules using mass spectrometry , 2010, Bioanalytical reviews.

[42]  Jinlian Wang,et al.  MetaboSearch: Tool for Mass-Based Metabolite Identification Using Multiple Databases , 2012, PloS one.

[43]  Yizeng Liang,et al.  Preprocessing of analytical profiles in the presence of homoscedastic or heteroscedastic noise , 1994 .

[44]  Rebecca C. Miller,et al.  Comparison of specific gravity and creatinine for normalizing urinary reproductive hormone concentrations. , 2004, Clinical chemistry.

[45]  Guillermo Reglero,et al.  In vitro antioxidant analysis of supercritical fluid extracts from rosemary (Rosmarinus officinalis L.) , 2005 .

[46]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Akira Oikawa,et al.  Assessment of Metabolome Annotation Quality: A Method for Evaluating the False Discovery Rate of Elemental Composition Searches , 2009, PloS one.

[48]  I. Wilson,et al.  A pragmatic and readily implemented quality control strategy for HPLC-MS and GC-MS-based metabonomic analysis. , 2006, The Analyst.

[49]  Serge Rudaz,et al.  Knowledge discovery in metabolomics: an overview of MS data handling. , 2010, Journal of separation science.

[50]  Johan Trygg,et al.  Chemometrics in Metabolomics — An Introduction , 2006 .

[51]  E. Want,et al.  Global metabolic profiling procedures for urine using UPLC–MS , 2010, Nature Protocols.

[52]  U. Edlund,et al.  Visualization of GC/TOF-MS-based metabolomics data for identification of biochemically interesting compounds using OPLS class models. , 2008, Analytical chemistry.

[53]  Age K. Smilde,et al.  Assessing the performance of statistical validation tools for megavariate metabolomics data , 2006, Metabolomics.

[54]  I. Schuppe-Koistinen,et al.  Metabolic fingerprinting of rat urine by LC/MS Part 2. Data pretreatment methods for handling of complex data. , 2005, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[55]  Joshua D. Knowles,et al.  Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry , 2011, Nature Protocols.

[56]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[57]  Z. Ramadan,et al.  Metabolic profiling using principal component analysis, discriminant partial least squares, and genetic algorithms. , 2006, Talanta.