Integration of wavelet transform with PCA and ANN for metabolomics data-mining

PCA (principal components analysis) and ANN (artificial neural network) are two broadly used pattern recognition methods in metabolomics data-mining. Yet their limitations sometimes are great obstacles for researchers. In this paper the wavelet transform (WT) method was used to integrate with PCA and ANN to improve their performance in manipulating metabolomics data. A dataset was decomposed by wavelets and then reconstructed. The "hard thresholding" algorithm was used, through which the detail information was discarded, and the entire "metabolomics image" reconstructed on the significant information. It was supposed that the most relevant information was captured after this process. It was found that, thanks to its ability in denoising data, the WT method could significantly improve the performance of the non-linear essence-extracting method ANN in classifying samples; further integration of WT with PCA showed that WT could greatly enhance the ability of PCA in distinguishing one group of samples from another and also its ability in identifying potential biomarkers. The results highlighted WT as a promising resolution in bridging the gap between huge bytes of data and the instructive biological information.

[1]  I. Wilson,et al.  Understanding 'Global' Systems Biology: Metabonomics and the Continuum of Metabolism , 2003, Nature Reviews Drug Discovery.

[2]  XirasagarSandhya,et al.  Chemical effects in biological systems (CEBS) object model for toxicology data, SysTox-OM , 2006 .

[3]  M Tusar,et al.  Viscosity prediction of lipophilic semisolid emulsion systems by neural network modelling. , 2000, International journal of pharmaceutics.

[4]  D. Kell,et al.  Metabolomics by numbers: acquiring and understanding global metabolite data. , 2004, Trends in biotechnology.

[5]  Carlos Dias Maciel,et al.  Wavelet time-frequency analysis and least squares support vector machines for the identification of voice disorders , 2007, Comput. Biol. Medicine.

[6]  Joachim Selbig,et al.  Metabolite fingerprinting: detecting biological features by independent component analysis , 2004, Bioinform..

[7]  Honglian Shi,et al.  Development of biomarkers based on diet-dependent metabolic serotypes: practical issues in development of expert system-based classification models in metabolomic studies. , 2004, Omics : a journal of integrative biology.

[8]  G. Nason,et al.  Wavelet processes and adaptive estimation of the evolutionary wavelet spectrum , 2000 .

[9]  I. Jolliffe Principal Component Analysis , 2002 .

[10]  Royston Goodacre,et al.  Metabolomics: Current technologies and future trends , 2006, Proteomics.

[11]  Mark Harrison,et al.  Adaptive binning: An improved binning method for metabolomics data using the undecimated wavelet transform , 2007 .

[12]  Henrik Antti,et al.  Contemporary issues in toxicology the role of metabonomics in toxicology and its evaluation by the COMET project. , 2003, Toxicology and applied pharmacology.

[13]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Liu Chang-xiao,et al.  Significance of metabonomics in modern research of Chinese materia medica , 2004 .

[15]  S. Wold,et al.  Orthogonal signal correction of near-infrared spectra , 1998 .

[16]  Sagar V. Kamarthi,et al.  Feature Extraction From Wavelet Coefficients for Pattern Recognition Tasks , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Pietro Liò,et al.  Wavelets in bioinformatics and computational biology: state of art and perspectives , 2003, Bioinform..

[18]  E Holmes,et al.  Investigations into biochemical changes due to diurnal variation and estrus cycle in female rats using high-resolution (1)H NMR spectroscopy of urine and pattern recognition. , 2001, Analytical biochemistry.

[19]  Ross D. King,et al.  Application of metabolomics to plant genotype discrimination using statistics and machine learning , 2002, ECCB.

[20]  Fionn Murtagh,et al.  Image Processing and Data Analysis - The Multiscale Approach , 1998 .

[21]  Michael Unser,et al.  A review of wavelets in biomedical applications , 1996, Proc. IEEE.

[22]  Pierre R. Bushel,et al.  Chemical effects in biological systems (CEBS) object model for toxicology data, SysTox-OM: design and application , 2006, Bioinform..

[23]  W. Qian,et al.  Computerized analysis of cellular features and biomarkers for cytologic diagnosis of early lung cancer. , 2007, Analytical and quantitative cytology and histology.

[24]  Kozo Takayama,et al.  Artificial Neural Network as a Novel Method to Optimize Pharmaceutical Formulations , 2004, Pharmaceutical Research.

[25]  Pekka Teppola,et al.  Wavelets for scrutinizing multivariate exploratory models— interpreting models through multiresolution analysis , 2001 .

[26]  Age K. Smilde,et al.  Analysis of longitudinal metabolomics data , 2004, Bioinform..

[27]  S. Oliver,et al.  Metabolic control analysis as a tool in the elucidation of the function of novel genes , 1998 .

[28]  Mia Hubert,et al.  Robust PCA and classification in biosciences , 2004, Bioinform..

[29]  M. Reily,et al.  In vivo toxicity screening programs using metabonomics. , 2002, Combinatorial chemistry & high throughput screening.

[30]  S. Mallat A wavelet tour of signal processing , 1998 .

[31]  Maciej Kamiński,et al.  Analysis of multichannel biomedical data. , 2005, Acta neurobiologiae experimentalis.

[32]  J. Nicholson,et al.  Rapid and noninvasive diagnosis of the presence and severity of coronary heart disease using 1H-NMR-based metabonomics , 2002, Nature Medicine.

[33]  Jian Yang,et al.  Metabolomics spectral formatting, alignment and conversion tools (MSFACTs) , 2003, Bioinform..

[34]  John C Lindon,et al.  Automatic alignment of individual peaks in large high-resolution spectral data sets. , 2004, Journal of magnetic resonance.

[35]  D. Kell,et al.  A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations , 2001, Nature Biotechnology.