Error Analysis and Propagation in Metabolomics Data Analysis

Error analysis plays a fundamental role in describing the uncertainty in experimental results. It has several fundamental uses in metabolomics including experimental design, quality control of experiments, the selection of appropriate statistical methods, and the determination of uncertainty in results. Furthermore, the importance of error analysis has grown with the increasing number, complexity, and heterogeneity of measurements characteristic of ‘omics research. The increase in data complexity is particularly problematic for metabolomics, which has more heterogeneity than other omics technologies due to the much wider range of molecular entities detected and measured. This review introduces the fundamental concepts of error analysis as they apply to a wide range of metabolomics experimental designs and it discusses current methodologies for determining the propagation of uncertainty in appropriate metabolomics data analysis. These methodologies include analytical derivation and approximation techniques, Monte Carlo error analysis, and error analysis in metabolic inverse problems. Current limitations of each methodology with respect to metabolomics data analysis are also discussed.

[1]  Lee,et al.  Theoretical maximal precision for mass-to-charge ratio, amplitude, and width measurements in ion-counting mass analyzers , 2000, Analytical chemistry.

[2]  Wolfgang Wiechert,et al.  13C labeling experiments at metabolic nonstationary conditions: An exploratory study , 2008, BMC Bioinformatics.

[3]  Douglas B. Kell,et al.  Proposed minimum reporting standards for data analysis in metabolomics , 2007, Metabolomics.

[4]  H. Akaike A new look at the statistical model identification , 1974 .

[5]  Christoph Steinbeck,et al.  MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data , 2012, Nucleic Acids Res..

[6]  Stephen T. Buckland,et al.  Monte Carlo confidence intervals , 1984 .

[7]  M. Fay,et al.  Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. , 2010, Statistics surveys.

[8]  James M Piret,et al.  Error propagation from prime variables into specific rates and metabolic fluxes for mammalian cells in perfusion culture , 2009, Biotechnology progress.

[9]  Mollie E. Brooks,et al.  Generalized linear mixed models: a practical guide for ecology and evolution. , 2009, Trends in ecology & evolution.

[10]  Andrew N Lane,et al.  A novel deconvolution method for modeling UDP-N-acetyl-D-glucosamine biosynthetic pathways based on 13C mass isotopologue profiles under non-steady-state conditions , 2012, BMC Biology.

[11]  R. Nickerson Confirmation Bias: A Ubiquitous Phenomenon in Many Guises , 1998 .

[12]  Nigel W. Hardy,et al.  Proposed minimum reporting standards for chemical analysis , 2007, Metabolomics.

[13]  Elmar Heinzle,et al.  Metabolic flux analysis in eukaryotes. , 2010, Current opinion in biotechnology.

[14]  Jacky L. Snoep,et al.  BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems , 2005, Nucleic Acids Res..

[15]  Jens O Krömer,et al.  Quantitative analysis of intracellular sugar phosphates and sugar nucleotides in encapsulated streptococci using HPAEC‐PAD , 2009, Biotechnology journal.

[16]  Maurice G. Cox,et al.  The use of a Monte Carlo method for evaluating uncertainty and expanded uncertainty , 2006 .

[17]  I. D. Hill,et al.  Generating good pseudo-random numbers , 2006, Comput. Stat. Data Anal..

[18]  K. Pearson NOTES ON THE HISTORY OF CORRELATION , 1920 .

[19]  D. Moher,et al.  The Revised CONSORT Statement for Reporting Randomized Trials: Explanation and Elaboration , 2001, Annals of Internal Medicine.

[20]  H. Engl,et al.  Inverse problems in systems biology , 2009 .

[21]  G. Ruxton The unequal variance t-test is an underused alternative to Student's t-test and the Mann–Whitney U test , 2006 .

[22]  Andrew N Lane,et al.  The promise of metabolomics in cancer molecular therapeutics. , 2004, Current opinion in molecular therapeutics.

[23]  T. Hedner,et al.  Prospective randomized open blinded end-point (PROBE) study. A novel design for intervention trials. Prospective Randomized Open Blinded End-Point. , 1992, Blood pressure.

[24]  Zheng Zhao,et al.  Isotopic non-stationary 13C gluconate tracer method for accurate determination of the pentose phosphate pathway split-ratio in Penicillium chrysogenum. , 2008, Metabolic engineering.

[25]  Robert Tibshirani,et al.  Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .

[26]  Angie Wade,et al.  When t-tests or Wilcoxon-Mann-Whitney tests won't do. , 2010, Advances in physiology education.

[27]  J R Allen,et al.  Analytical bias in a quality control scheme. , 1969, Clinical chemistry.

[28]  Erwin P. Gianchandani,et al.  Flux balance analysis in the era of metabolomics , 2006, Briefings Bioinform..

[29]  B. Palsson,et al.  Theory for the systemic definition of metabolic pathways and their use in interpreting metabolic function from a pathway-oriented perspective. , 2000, Journal of theoretical biology.

[30]  N. Metropolis,et al.  The Monte Carlo method. , 1949 .

[31]  Joel Tellinghuisen,et al.  Statistical Error Propagation , 2001 .

[32]  F. Natterer Error bounds for tikhonov regularization in hilbert scales , 1984 .

[33]  M. Stephens EDF Statistics for Goodness of Fit and Some Comparisons , 1974 .

[34]  Paul S. Dwyer,et al.  Basic Instructions in Statistical Computations , 1957 .

[35]  Alexander Raskind,et al.  Statistical methods in metabolomics. , 2012, Methods in molecular biology.

[36]  T Szyperski,et al.  13C-NMR, MS and metabolic flux balancing in biotechnology research , 1998, Quarterly Reviews of Biophysics.

[37]  Madhukar S. Dasika,et al.  Metabolic flux elucidation for large-scale models using 13C labeled isotopes. , 2007, Metabolic engineering.

[38]  I. T. Young Proof without prejudice: use of the Kolmogorov-Smirnov test for the analysis of histograms from flow systems and other sources. , 1977, The journal of histochemistry and cytochemistry : official journal of the Histochemistry Society.

[39]  B. Palsson The challenges of in silico biology , 2000, Nature Biotechnology.

[40]  Douglas G Altman,et al.  Systematic reviews in health care: Assessing the quality of controlled clinical trials. , 2001, BMJ.

[41]  A. Lane,et al.  Stable isotope-resolved metabolomics and applications for drug development. , 2012, Pharmacology & therapeutics.

[42]  J. Heijnen,et al.  Linear constraint relations in biochemical reaction systems: II. Diagnosis and estimation of gross errors , 1994, Biotechnology and bioengineering.

[43]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[44]  A. Tamhane Multiple comparisons in model i one-way anova with unequal variances , 1977 .

[45]  R. Iman,et al.  A distribution-free approach to inducing rank correlation among input variables , 1982 .

[46]  D. Ransohoff Bias as a threat to the validity of cancer molecular-marker research , 2005, Nature reviews. Cancer.

[47]  C. D. Vale,et al.  Simulating multivariate nonnormal distributions , 1983 .

[48]  D. Rennie,et al.  The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. , 2003, Annals of internal medicine.

[49]  Dirk Weuster-Botz,et al.  Leakage of adenylates during cold methanol/glycerol quenching of Escherichia coli , 2008, Metabolomics.

[50]  Douglas M. Bates,et al.  LINEAR AND NONLINEAR MIXED-EFFECTS MODELS , 1998 .

[51]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[52]  P. Wentzell,et al.  Characterization of the measurement error structure in 1D 1H NMR data for metabolomics studies. , 2009, Analytica chimica acta.

[53]  D. Sackett Bias in analytic research. , 1979, Journal of chronic diseases.

[54]  Neil Swainston,et al.  Integration of metabolic databases for the reconstruction of genome-scale metabolic networks , 2010, BMC Systems Biology.

[55]  W. W. Daniel Applied Nonparametric Statistics , 1979 .

[56]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[57]  M. Viant,et al.  Analysis of time course 1H NMR metabolomics data by multivariate curve resolution , 2009, Magnetic resonance in chemistry : MRC.

[58]  W. Wiechert,et al.  Bidirectional reaction steps in metabolic networks: II. Flux estimation and statistical analysis. , 1997, Biotechnology and bioengineering.

[59]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[60]  Gregory Stephanopoulos,et al.  Evaluation of 13C isotopic tracers for metabolic flux analysis in mammalian cells. , 2009, Journal of biotechnology.

[61]  Scott B. Crown,et al.  Rational design of 13C-labeling experiments for metabolic flux analysis in mammalian cells , 2012, BMC Systems Biology.

[62]  Ute Roessner,et al.  Minimum reporting standards for plant biology context information in metabolomic studies , 2007, Metabolomics.

[63]  Gregory Stephanopoulos,et al.  Nontargeted elucidation of metabolic pathways using stable-isotope tracers and mass spectrometry. , 2010, Analytical chemistry.

[64]  Andrew N Lane,et al.  Isotopomer-based metabolomic analysis by NMR and mass spectrometry. , 2008, Methods in cell biology.

[65]  T. Kaptchuk The double-blind, randomized, placebo-controlled trial: gold standard or golden calf? , 2001, Journal of clinical epidemiology.

[66]  Rudolph Willem,et al.  A computational strategy for the deconvolution of NMR spectra with multiplet structures and constraints: Analysis of overlapping 13C‐2H multiplets of 13C enriched metabolites from cell suspensions incubated in deuterated media , 1996, Magnetic resonance in medicine.

[67]  Franco Magno,et al.  A statistical overview on univariate calibration, inverse regression, and detection limits: Application to gas chromatography/mass spectrometry technique. , 2007, Mass spectrometry reviews.

[68]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[69]  Nicolas Le Novère,et al.  BioModels.net Web Services, a free and integrated toolkit for computational modelling software , 2010, Briefings Bioinform..

[70]  W. Wiechert,et al.  Bidirectional reaction steps in metabolic networks: I. Modeling and simulation of carbon isotope labeling experiments. , 1997, Biotechnology and bioengineering.

[71]  Gregory Stephanopoulos,et al.  Determination of confidence intervals of metabolic fluxes estimated from stable isotope measurements. , 2006, Metabolic engineering.

[72]  Shlomo S. Sawilowsky,et al.  Simulating correlated multivariate nonnormal distributions: Extending the fleishman power method , 1999 .

[73]  D. Moher,et al.  CONSORT 2010 statement: Updated guidelines for reporting parallel group randomised trials , 2010, Journal of pharmacology & pharmacotherapeutics.

[74]  Douglas B. Kell,et al.  Statistical strategies for avoiding false discoveries in metabolomics and related experiments , 2007, Metabolomics.

[75]  Bernhard O. Palsson,et al.  Predicting outcomes of steady-state 13C isotope tracing experiments using Monte Carlo sampling , 2012, BMC Systems Biology.

[76]  G. Stephanopoulos,et al.  Application of macroscopic balances to the identification of gross measurement errors , 1983, Biotechnology and bioengineering.

[77]  Wolfgang Wiechert,et al.  Experimental design principles for isotopically instationary 13C labeling experiments , 2006, Biotechnology and bioengineering.

[78]  Frank Suits,et al.  A noise model for mass spectrometry based proteomics , 2008, Bioinform..

[79]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .