Probabilistic Approach for Evaluating Metabolite Sample Integrity

The success of metabolomics studies depends upon the "fitness" of each biological sample used for analysis: it is critical that metabolite levels reported for a biological sample represent an accurate snapshot of the studied organism's metabolite profile at time of sample collection. Numerous factors may compromise metabolite sample fitness, including chemical and biological factors which intervene during sample collection, handling, storage, and preparation for analysis. We propose a probabilistic model for the quantitative assessment of metabolite sample fitness. Collection and processing of nuclear magnetic resonance (NMR) and ultra-performance liquid chromatography (UPLC-MS) metabolomics data is discussed. Feature selection methods utilized for multivariate data analysis are briefly reviewed, including feature clustering and computation of latent vectors using spectral methods. We propose that the time-course of metabolite changes in samples stored at different temperatures may be utilized to identify changing-metabolite-to-stable-metabolite ratios as markers of sample fitness. Tolerance intervals may be computed to characterize these ratios among fresh samples. In order to discover additional structure in the data relevant to sample fitness, we propose using data labeled according to these ratios to train a Dirichlet process mixture model (DPMM) for assessing sample fitness. DPMMs are highly intuitive since they model the metabolite levels in a sample as arising from a combination of processes including, e.g., normal biological processes and degradation- or contamination-inducing processes. The outputs of a DPMM are probabilities that a sample is associated with a given process, and these probabilities may be incorporated into a final classifier for sample fitness.

[1]  Mario Plebani,et al.  Performance criteria and quality indicators for the post-analytical phase , 2015, Clinical chemistry and laboratory medicine.

[2]  C. Chen,et al.  Shrunken centroids regularized discriminant analysis as a promising strategy for metabolomics data exploration , 2015 .

[3]  David S. Wishart,et al.  Accurate, Fully-Automated NMR Spectral Profiling for Metabolomics , 2014, PloS one.

[4]  Mario Plebani,et al.  Performance criteria and quality indicators for the pre-analytical phase , 2015, Clinical chemistry and laboratory medicine.

[5]  Michael L. Turner,et al.  The influence of scaling metabolomics data on model classification accuracy , 2015, Metabolomics.

[6]  D. Wishart,et al.  Standardizing the experimental conditions for using urine in NMR-based metabolomic studies with a particular focus on diagnostic studies: a review , 2014, Metabolomics.

[7]  M. Spraul,et al.  Precision high-throughput proton NMR spectroscopy of human urine, serum, and plasma for large-scale metabolic phenotyping. , 2014, Analytical chemistry.

[8]  Thomas Mathew,et al.  Improved nonparametric tolerance intervals based on interpolated and extrapolated order statistics† , 2014 .

[9]  Patrick Giraudeau,et al.  Reference and normalization methods: essential tools for the intercomparison of NMR spectra. , 2014, Journal of pharmaceutical and biomedical analysis.

[10]  Erik Peter,et al.  Quality markers addressing preanalytical variations of blood and plasma processing identified by broad and targeted metabolite profiling. , 2014, Clinical chemistry.

[11]  M. Lewis,et al.  Signal Intensities Derived from Different NMR Probes and Parameters Contribute to Variations in Quantification of Metabolites , 2014, PloS one.

[12]  N. Magendiran,et al.  An Efficient Fast Clustering-Based Feature Subset Selection Algorithm for High- Dimensional Data , 2014 .

[13]  G. Zararsiz,et al.  MVN: An R Package for Assessing Multivariate Normality , 2014, R J..

[14]  Bo Wang,et al.  Coefficient of Variation, Signal-to-Noise Ratio, and Effects of Normalization in Validation of Biomarkers from NMR-based Metabonomics Studies. , 2013, Chemometrics and intelligent laboratory systems : an international journal sponsored by the Chemometrics Society.

[15]  Tomasz Burzykowski,et al.  Evaluation of normalization methods to pave the way towards large-scale LC-MS-based metabolomics profiling experiments. , 2013, Omics : a journal of integrative biology.

[16]  H. C. Bertram,et al.  Time-saving design of experiment protocol for optimization of LC-MS data processing in metabolomic approaches. , 2013, Analytical chemistry.

[17]  A. Weljie,et al.  Evaluation of 1H NMR metabolic profiling using biofluid mixture design. , 2013, Analytical chemistry.

[18]  Andreas Zell,et al.  Preanalytical aspects and sample quality assessment in metabolomics studies of human blood. , 2013, Clinical chemistry.

[19]  Fabian J Theis,et al.  Statistical methods for the analysis of high-throughput metabolomics data , 2013, Computational and structural biotechnology journal.

[20]  Hunter N.B. Moseley,et al.  Error Analysis and Propagation in Metabolomics Data Analysis , 2013, Computational and structural biotechnology journal.

[21]  Paolo Vineis,et al.  Performance in Omics Analyses of Blood Samples in Long-Term Storage: Opportunities for the Exploitation of Existing Biobanks in Environmental Health Research , 2013, Environmental health perspectives.

[22]  M. Calus,et al.  Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding , 2013, Genetics.

[23]  Robert S Plumb,et al.  Global metabolic profiling of animal and human tissues via UPLC-MS , 2012, Nature Protocols.

[24]  M. Čuperlović-Culf NMR metabolomics in cancer research , 2012 .

[25]  Anne-Laure Boulesteix,et al.  Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics , 2012, WIREs Data Mining Knowl. Discov..

[26]  Erik Johansson,et al.  Strategy for optimizing LC-MS data processing in metabolomics: a design of experiments approach. , 2012, Analytical chemistry.

[27]  Alexander Amberg,et al.  Intra- and interlaboratory reproducibility of ultra performance liquid chromatography-time-of-flight mass spectrometry for urinary metabolic profiling. , 2012, Analytical chemistry.

[28]  H. Ressom,et al.  LC-MS-based metabolomics. , 2012, Molecular bioSystems.

[29]  Jun Feng Xiao,et al.  Metabolite identification and quantitation in LC-MS/MS-based metabolomics. , 2012, Trends in analytical chemistry : TRAC.

[30]  Alexander Raskind,et al.  Statistical methods in metabolomics. , 2012, Methods in molecular biology.

[31]  Gregory D. Tredwell,et al.  Between-person comparison of metabolite fitting for NMR-based quantitative metabolomics. , 2011, Analytical chemistry.

[32]  T. Ebbels,et al.  Optimized preprocessing of ultra-performance liquid chromatography/mass spectrometry urinary metabolic profiles for improved information recovery. , 2011, Analytical chemistry.

[33]  H. Keun,et al.  Nuclear magnetic resonance (NMR)-based metabolomics. , 2011, Methods in molecular biology.

[34]  Y. B. Wah,et al.  Power comparisons of Shapiro-Wilk , Kolmogorov-Smirnov , Lilliefors and Anderson-Darling tests , 2011 .

[35]  M. Milburn,et al.  Metabolomic profiling can predict which humans will develop liver dysfunction when deprived of dietary choline , 2010, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[36]  Vandana,et al.  Survey of Nearest Neighbor Techniques , 2010, ArXiv.

[37]  Johan Trygg,et al.  Chemometrics in metabolomics--a review in human disease diagnosis. , 2010, Analytica chimica acta.

[38]  R. Schuhmacher,et al.  On the inter-instrument and inter-laboratory transferability of a tandem mass spectral reference library: 1. Results of an Austrian multicenter study. , 2009, Journal of mass spectrometry : JMS.

[39]  Stephen J. Bruce,et al.  Investigation of human blood plasma sample preparation for performing metabolomics using ultrahigh performance liquid chromatography/mass spectrometry. , 2009, Analytical chemistry.

[40]  Peter Filzmoser,et al.  Introduction to Multivariate Statistical Analysis in Chemometrics , 2009 .

[41]  Lekha Bhambhu,et al.  DATA CLASSIFICATION USING SUPPORT VECTOR MACHINE , 2009 .

[42]  Saiful Islam,et al.  Mahalanobis Distance , 2009, Encyclopedia of Biometrics.

[43]  Aalim M Weljie,et al.  Evaluating low-intensity unknown signals in quantitative proton NMR mixture analysis. , 2008, Analytical chemistry.

[44]  Seoung Bum Kim,et al.  Controlling the False Discovery Rate for Feature Selection in High-resolution NMR Spectra , 2008 .

[45]  Seoung Bum Kim,et al.  Controlling the False Discovery Rate for Feature Selection in High‐resolution NMR Spectra , 2008, Stat. Anal. Data Min..

[46]  P. Elliott,et al.  The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine. , 2008, International journal of epidemiology.

[47]  Irena Spasic,et al.  A GC-TOF-MS study of the stability of serum and urine metabolomes during the UK Biobank sample collection and preparation protocols. , 2008, International journal of epidemiology.

[48]  T. Ebbels,et al.  Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts , 2007, Nature Protocols.

[49]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[50]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[51]  Michel Verleysen,et al.  Feature clustering and mutual information for the selection of variables in spectral data , 2007, ESANN.

[52]  Brian D. Sykes,et al.  Urine stability for metabolomic studies: effects of preparation and storage , 2007, Metabolomics.

[53]  M. Rantalainen,et al.  OPLS discriminant analysis: combining the strengths of PLS‐DA and SIMCA classification , 2006 .

[54]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.

[55]  H. Senn,et al.  Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. , 2006, Analytical chemistry.

[56]  Erin E. Carlson,et al.  Targeted profiling: quantitative analysis of 1H NMR metabolomics data. , 2006, Analytical chemistry.

[57]  O. Vitek,et al.  Statistical design of experiments as a tool in mass spectrometry. , 2005, Journal of mass spectrometry : JMS.

[58]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[59]  Fernando A. Quintana,et al.  Nonparametric Bayesian data analysis , 2004 .

[60]  T. Ebbels,et al.  Improved analysis of multivariate data by variable stability scaling: application to NMR-based metabolic profiling , 2003 .

[61]  T. Ebbels,et al.  Analytical reproducibility in (1)H NMR-based metabonomic urinalysis. , 2002, Chemical research in toxicology.

[62]  J. Trygg O2‐PLS for qualitative and quantitative analysis in multivariate calibration , 2002 .

[63]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[64]  S. Wold,et al.  Orthogonal projections to latent structures (O‐PLS) , 2002 .

[65]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[66]  James R. Miller Vector Geometry for Computer Graphics , 1999, IEEE Computer Graphics and Applications.

[67]  Jean-Claude Libeer,et al.  Proposals for Setting Generally Applicable Quality Goals Solely Based on Biology , 1997, Annals of clinical biochemistry.

[68]  S. Wold,et al.  SIMCA: A Method for Analyzing Chemical Data in Terms of Similarity and Analogy , 1977 .

[69]  H. Lilliefors On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown , 1967 .

[70]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[71]  E. Nadaraya On Estimating Regression , 1964 .

[72]  J. Wolfowitz,et al.  Tolerance Limits for a Normal Distribution , 1946 .