Fusing metabolomics data sets with heterogeneous measurement errors

Combining different metabolomics platforms can contribute significantly to the discovery of complementary processes expressed under different conditions. However, analysing the fused data might be hampered by the difference in their quality. In metabolomics data, one often observes that measurement errors increase with increasing measurement level and that different platforms have different measurement error variance. In this paper we compare three different approaches to correct for the measurement error heterogeneity, by transformation of the raw data, by weighted filtering before modelling and by a modelling approach using a weighted sum of residuals. For an illustration of these different approaches we analyse data from healthy obese and diabetic obese individuals, obtained from two metabolomics platforms. Concluding, the filtering and modelling approaches that both estimate a model of the measurement error did not outperform the data transformation approaches for this application. This is probably due to the limited difference in measurement error and the fact that estimation of measurement error models is unstable due to the small number of repeats available. A transformation of the data improves the classification of the two groups.

[1]  Marieke E. Timmerman,et al.  Four simultaneous component models for the analysis of multivariate time series from more than one subject to model intraindividual and interindividual differences , 2003 .

[2]  Age K Smilde,et al.  A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics* , 2012, Molecular & Cellular Proteomics.

[3]  Tom F. Wilderjans,et al.  A flexible framework for sparse simultaneous component based data integration , 2011, BMC Bioinformatics.

[4]  Mark R Viant,et al.  Spectral relative standard deviation: a practical benchmark in metabolomics. , 2009, The Analyst.

[5]  R. A. van den Berg,et al.  Simultaneous analysis of coupled data matrices subject to different amounts of noise. , 2011, The British journal of mathematical and statistical psychology.

[6]  David M. Rocke,et al.  A Two-Component Model for Measurement Error in Analytical Chemistry , 1995 .

[7]  N. Sidiropoulos,et al.  Maximum likelihood fitting using ordinary least squares algorithms , 2002 .

[8]  Mark R. Viant,et al.  Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation , 2007, BMC Bioinformatics.

[9]  Age K. Smilde,et al.  Maximum likelihood scaling (MALS) , 2006 .

[10]  Olga Ilkayeva,et al.  BMI, RQ, Diabetes, and Sex Affect the Relationships Between Amino Acids and Clamp Measures of Insulin Action in Humans , 2014, Diabetes.

[11]  Age K. Smilde,et al.  Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies , 2011, Metabolomics.

[12]  David M. Rocke,et al.  Estimation of Transformation Parameters for Microarray Data , 2003, Bioinform..

[13]  P. G. Kistemaker,et al.  Discriminant analysis by double stage principal component analysis , 1983 .

[14]  Frans M van der Kloet,et al.  Analytical error reduction using single point calibration for accurate and precise metabolomic phenotyping. , 2009, Journal of proteome research.

[15]  Thomas Hankemeier,et al.  Roux-en-Y Gastric Bypass Surgery, but Not Calorie Restriction, Reduces Plasma Branched-Chain Amino Acids in Obese Women Independent of Weight Loss or the Presence of Type 2 Diabetes , 2014, Diabetes Care.

[16]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.

[17]  Marieke E Timmerman,et al.  Multilevel component analysis. , 2006, The British journal of mathematical and statistical psychology.

[18]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[19]  David M. Rocke,et al.  Discrimination models using variance-stabilizing transformation of metabolomic NMR data. , 2004, Omics : a journal of integrative biology.

[20]  Age K. Smilde,et al.  A Classification Model for the Leiden Proteomics Competition , 2008, Statistical applications in genetics and molecular biology.

[21]  Tom F. Wilderjans,et al.  Integrating functional genomics data using maximum likelihood based simultaneous component analysis , 2009, BMC Bioinformatics.

[22]  Douglas M. Hawkins,et al.  A variance-stabilizing transformation for gene-expression microarray data , 2002, ISMB.

[23]  Wei-Hao Wang,et al.  Studies , 1926 .

[24]  A. Smilde,et al.  New figures of merit for comprehensive functional genomics data: the metabolomics case. , 2011, Analytical chemistry.

[25]  R. W. Lutz,et al.  Metabolic profiling of glucuronides in human urine by LC-MS/MS and partial least-squares discriminant analysis for classification and prediction of gender. , 2006, Analytical chemistry.

[26]  J. Macgregor,et al.  Analysis of multiblock and hierarchical PCA and PLS models , 1998 .