Bagged K-Means Clustering of Metabolome Data

Clustering of metabolomics data can be hampered by noise originating from biological variation, physical sampling error and analytical error. Using data analysis methods which are not specially suited for dealing with noisy data will yield sub optimal solutions. Bootstrap aggregating (bagging) is a resampling technique that can deal with noise and improves accuracy. This paper demonstrates the possibilities for bagged clustering applied to metabolomics data. The metabolomics data used in this paper is computer-generated with the human red blood cell model. Perturbing this model can be done in several ways. In this paper, inhibition experiments are mimicked inhibiting enzyme activity to 10% of its original value. Comparing bagged K-means clustering to ordinary K-means, the number of metabolites switching clusters under the influence of heteroscedastic noise is lower if bagging is used. This favors bagged K-means above ordinary K-means clustering when dealing with noisy metabolomics data. A special validation scheme, independent of the addition of noise, has been devised to demonstrate the positive effects of bagging on clustering.

[1]  P. Kuchel,et al.  Modelling metabolism with Mathematica : detailed examples including erythrocyte metabolism , 2003 .

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[4]  F. Leisch Bagged Clustering , 1999 .

[5]  Jacky L. Snoep,et al.  Web-based kinetic modelling using JWS Online , 2004, Bioinform..

[6]  Oliver Fiehn,et al.  Deciphering metabolic networks. , 2003, European journal of biochemistry.

[7]  D. Kell Metabolomics and systems biology: making sense of the soup. , 2004, Current opinion in microbiology.

[8]  Philip W. Kuchel,et al.  Model of 2,3-bisphosphoglycerate metabolism in the human erythrocyte based on detailed enzyme kinetic equations: computer simulation and Metabolic Control Analysis , 1999 .

[9]  P W Kuchel,et al.  Model of 2,3-bisphosphoglycerate metabolism in the human erythrocyte based on detailed enzyme kinetic equations: equations and parameter refinement. , 1999, The Biochemical journal.

[10]  J. Ross,et al.  A Test Case of Correlation Metric Construction of a Reaction Pathway from Measurements , 1997 .

[11]  Matej Oresic,et al.  Integrative biological analysis of the APOE*3-leiden transgenic mouse. , 2004, Omics : a journal of integrative biology.

[12]  P. Mendes,et al.  The origin of correlations in metabolomics data , 2005, Metabolomics.

[13]  O. Fiehn,et al.  Interpreting correlations in metabolomic networks. , 2003, Biochemical Society transactions.

[14]  P W Kuchel,et al.  Model of 2,3-bisphosphoglycerate metabolism in the human erythrocyte based on detailed enzyme kinetic equations: in vivo kinetic characterization of 2,3-bisphosphoglycerate synthase/phosphatase using 13C and 31P NMR. , 1999, The Biochemical journal.

[15]  Jürgen Kurths,et al.  Observing and Interpreting Correlations in Metabolic Networks , 2003, Bioinform..

[16]  M K Kerr,et al.  Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[17]  T. Hankemeier,et al.  Microbial metabolomics: replacing trial-and-error by the unbiased selection and ranking of targets , 2005, Journal of Industrial Microbiology and Biotechnology.

[18]  Nathan Intrator,et al.  Bootstrapping with Noise: An Effective Regularization Technique , 1996, Connect. Sci..

[19]  Thomas Linke,et al.  Visualizing plant metabolomic correlation networks using clique-metabolite matrices , 2001, Bioinform..

[20]  T. Næs,et al.  Ensemble methods and partial least squares regression , 2004 .

[21]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..