MetaFIND: A feature analysis tool for metabolomics data

BackgroundMetabolomics, or metabonomics, refers to the quantitative analysis of all metabolites present within a biological sample and is generally carried out using NMR spectroscopy or Mass Spectrometry. Such analysis produces a set of peaks, or features, indicative of the metabolic composition of the sample and may be used as a basis for sample classification. Feature selection may be employed to improve classification accuracy or aid model explanation by establishing a subset of class discriminating features. Factors such as experimental noise, choice of technique and threshold selection may adversely affect the set of selected features retrieved. Furthermore, the high dimensionality and multi-collinearity inherent within metabolomics data may exacerbate discrepancies between the set of features retrieved and those required to provide a complete explanation of metabolite signatures. Given these issues, the latter in particular, we present the MetaFIND application for 'post-feature selection' correlation analysis of metabolomics data.ResultsIn our evaluation we show how MetaFIND may be used to elucidate metabolite signatures from the set of features selected by diverse techniques over two metabolomics datasets. Importantly, we also show how MetaFIND may augment standard feature selection and aid the discovery of additional significant features, including those which represent novel class discriminating metabolites. MetaFIND also supports the discovery of higher level metabolite correlations.ConclusionStandard feature selection techniques may fail to capture the full set of relevant features in the case of high dimensional, multi-collinear metabolomics data. We show that the MetaFIND 'post-feature selection' analysis tool may aid metabolite signature elucidation, feature discovery and inference of metabolic correlations.

[1]  D. Higgins,et al.  Influence of acute phytochemical intake on human urinary metabolomic profiles. , 2007, The American journal of clinical nutrition.

[2]  Erin E. Carlson,et al.  Targeted profiling: quantitative analysis of 1H NMR metabolomics data. , 2006, Analytical chemistry.

[3]  John C. Lindon,et al.  Pattern recognition methods and applications in biomedical magnetic resonance , 2001 .

[4]  D. Kell,et al.  A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations , 2001, Nature Biotechnology.

[5]  Oliver Fiehn,et al.  Combining Genomics, Metabolome Analysis, and Biochemical Modelling to Understand Metabolic Networks , 2001, Comparative and functional genomics.

[6]  F Baganz,et al.  Systematic functional analysis of the yeast genome. , 1998, Trends in biotechnology.

[7]  M. Rantalainen,et al.  OPLS discriminant analysis: combining the strengths of PLS‐DA and SIMCA classification , 2006 .

[8]  D. Gauguier,et al.  Statistical total correlation spectroscopy: an exploratory approach for latent biomarker identification from metabolic 1H NMR data sets. , 2005, Analytical chemistry.

[9]  Ralf Steuer,et al.  Review: On the analysis and interpretation of correlations in metabolomic data , 2006, Briefings Bioinform..

[10]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[11]  E Holmes,et al.  Metabonomic characterization of genetic variations in toxicological and metabolic responses using probabilistic neural networks. , 2001, Chemical research in toxicology.

[12]  D. Kell,et al.  Metabolomics by numbers: acquiring and understanding global metabolite data. , 2004, Trends in biotechnology.

[13]  Anna,et al.  Rapid Assessment of the Adulteration of Virgin Olive Oils by Other Seed Oils Using Pyrolysis Mass Spectrometry and Artificial Neural Networks , 1993 .

[14]  Jens Nielsen,et al.  Metabolite profiling of fungi and yeast: from phenotype to metabolome by MS and informatics. , 2005, Journal of experimental botany.

[15]  S. Wold,et al.  Orthogonal projections to latent structures (O‐PLS) , 2002 .

[16]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.

[17]  Guowang Xu,et al.  Discrimination of Type 2 diabetic patients from healthy controls by using metabonomics method based on their serum fatty acid profiles. , 2004, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[18]  Xiaohui Fan,et al.  Diagnosis of breast cancer using HPLC metabonomics fingerprints coupled with computational methods , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  J. A. Westerhuis,et al.  Bagged K-Means Clustering of Metabolome Data , 2006 .

[21]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[22]  J. Lindon,et al.  The identification of novel biomarkers of renal toxicity using automatic data reduction techniques and PCA of proton NMR spectra of urine , 1998 .

[23]  E Holmes,et al.  Probing latent biomarker signatures and in vivo pathway activity in experimental disease states via statistical total correlation spectroscopy (STOCSY) of biofluids: application to HgCl2 toxicity. , 2006, Journal of proteome research.