MetICA: independent component analysis for high-resolution mass-spectrometry based non-targeted metabolomics

BackgroundInterpreting non-targeted metabolomics data remains a challenging task. Signals from non-targeted metabolomics studies stem from a combination of biological causes, complex interactions between them and experimental bias/noise. The resulting data matrix usually contain huge number of variables and only few samples, and classical techniques using nonlinear mapping could result in computational complexity and overfitting. Independent Component Analysis (ICA) as a linear method could potentially bring more meaningful results than Principal Component Analysis (PCA). However, a major problem with most ICA algorithms is the output variations between different runs and the result of a single ICA run should be interpreted with reserve.ResultsICA was applied to simulated and experimental mass spectrometry (MS)-based non-targeted metabolomics data, under the hypothesis that underlying sources are mutually independent. Inspired from the Icasso algorithm, a new ICA method, MetICA was developed to handle the instability of ICA on complex datasets. Like the original Icasso algorithm, MetICA evaluated the algorithmic and statistical reliability of ICA runs. In addition, MetICA suggests two ways to select the optimal number of model components and gives an order of interpretation for the components obtained.ConclusionsCorrelating the components obtained with prior biological knowledge allows understanding how non-targeted metabolomics data reflect biological nature and technical phenomena. We could also extract mass signals related to this information. This novel approach provides meaningful components due to their independent nature. Furthermore, it provides an innovative concept on which to base model selection: that of optimizing the number of reliable components instead of trying to fit the data. The current version of MetICA is available at https://github.com/daniellyz/MetICA.

[1]  Anton Hartmann,et al.  Importance of sulfur-containing metabolites in discriminating fecal extracts between normal and type-2 diabetic mice. , 2014, Journal of proteome research.

[2]  Elaine Holmes,et al.  Impact of analytical bias in metabonomic studies of human blood serum and plasma. , 2006, Analytical chemistry.

[3]  K. Suhre,et al.  DI-ICR-FT-MS-based high-throughput deep metabotyping: a case study of the Caenorhabditis elegans–Pseudomonas aeruginosa infection model , 2015, Analytical and Bioanalytical Chemistry.

[4]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[5]  B. Everitt,et al.  Cluster Analysis: Everitt/Cluster Analysis , 2011 .

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  Sylvie Dequin,et al.  Pilot-scale evaluation the enological traits of a novel, aromatic wine yeast strain obtained by adaptive evolution. , 2012, Food microbiology.

[8]  A. Danchin,et al.  Molecular diagnosis of human cancer type by gene expression profiles and independent component analysis , 2005, European Journal of Human Genetics.

[9]  Johan A. K. Suykens,et al.  Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction , 2004, Bioinform..

[10]  Aapo Hyvärinen,et al.  Validating the independent components of neuroimaging time series via clustering and visualization , 2004, NeuroImage.

[11]  Joachim Selbig,et al.  Integration of Metabolomic and Proteomic Phenotypes , 2008, Molecular & Cellular Proteomics.

[12]  D. Chakrabarti,et al.  A fast fixed - point algorithm for independent component analysis , 1997 .

[13]  Hiromu Ohno,et al.  Dimensionality reduction for metabolome data using PCA, PLS, OPLS, and RFDA with differential penalties to latent variables , 2009 .

[14]  Hans-Werner Mewes,et al.  Bioinformatics analysis of targeted metabolomics--uncovering old and new tales of diabetic mice under medication. , 2008, Endocrinology.

[15]  Royston Goodacre,et al.  Metabolic footprinting as a tool for discriminating between brewing yeasts , 2007, Yeast.

[16]  Douglas B. Kell,et al.  Statistical strategies for avoiding false discoveries in metabolomics and related experiments , 2007, Metabolomics.

[17]  P. Schmitt‐Kopplin,et al.  Liquid chromatography-mass spectrometry in metabolomics research: mass analyzers in ultra high pressure liquid chromatography coupling. , 2013, Journal of chromatography. A.

[18]  Yusuke Tanaka,et al.  Cross-Validation, Bootstrap, and Support Vector Machines , 2011, Adv. Artif. Neural Syst..

[19]  Pierre-Antoine Absil,et al.  Elucidating the Altered Transcriptional Programs in Breast Cancer using Independent Component Analysis , 2007, PLoS Comput. Biol..

[20]  Aapo Hyvärinen,et al.  Sparse Code Shrinkage: Denoising of Nongaussian Data by Maximum Likelihood Estimation , 1999, Neural Computation.

[21]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[22]  Amparo Querol,et al.  Metabolomic Comparison of Saccharomyces cerevisiae and the Cryotolerant Species S. bayanus var. uvarum and S. kudriavzevii during Wine Fermentation at Low Temperature , 2013, PloS one.

[23]  Sirish L. Shah,et al.  Analysis of metabolomic data using support vector machines. , 2008, Analytical chemistry.

[24]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[25]  Lei Wang,et al.  Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[26]  Joachim Kopka,et al.  Metabolic profiling reveals local and systemic responses of host plants to nematode parasitism , 2010, The Plant journal : for cell and molecular biology.

[27]  William E. Kraus,et al.  Relationships Between Circulating Metabolic Intermediates and Insulin Action in Overweight to Obese, Inactive Men and Women , 2009, Diabetes Care.

[28]  Adam C Resnick,et al.  UDP-glucuronate Decarboxylase, a Key Enzyme in Proteoglycan Synthesis , 2002, The Journal of Biological Chemistry.

[29]  Xin Lu,et al.  Independent component analysis in non-hypothesis driven metabolomics: improvement of pattern discovery and simplification of biological data interpretation demonstrated with plasma samples of exercising humans. , 2012, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[30]  Karsten Suhre,et al.  MassTRIX: mass translator into pathways , 2008, Nucleic Acids Res..

[31]  Jarkko Venna,et al.  Analysis and visualization of gene expression data using Self-Organizing Maps , 2002, Neural Networks.

[32]  Joachim Selbig,et al.  Metabolite fingerprinting: detecting biological features by independent component analysis , 2004, Bioinform..

[33]  Ole Winther,et al.  Mean-Field Approaches to Independent Component Analysis , 2002, Neural Computation.

[34]  Licun Wu,et al.  Metabolomic Heterogeneity of Pulmonary Arterial Hypertension , 2014, PloS one.

[35]  Fabian J Theis,et al.  Bayesian independent component analysis recovers pathway signatures from blood metabolomics data. , 2012, Journal of proteome research.

[36]  A. D. Gordon A Review of Hierarchical Classification , 1987 .

[37]  Erkki Oja,et al.  Independent component approach to the analysis of EEG and MEG recordings , 2000, IEEE Transactions on Biomedical Engineering.

[38]  José Camacho,et al.  Cross‐validation in PCA models with the element‐wise k‐fold (ekf) algorithm: theoretical aspects , 2012 .

[39]  Motoaki Kawanabe,et al.  ASSESSING RELIABILITY OF ICA PROJECTIONS - A RESAMPLING APPROACH , 2001 .

[40]  Michael H. Neumann,et al.  Bootstrapping Neural Networks , 2000, Neural Computation.

[41]  Matthias Scholz,et al.  A Metabolomic Approach to the Study of Wine Micro-Oxygenation , 2012, PloS one.

[42]  José Camacho,et al.  Cross-validation in PCA models with the element-wise k-fold (ekf) algorithm: Practical aspects , 2014 .

[43]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[44]  N. Hertkorn,et al.  Kendrick-Analogous Network Visualisation of Ion Cyclotron Resonance Fourier Transform Mass Spectra: Improved Options for the Assignment of Elemental Compositions and the Classification of Organic Molecular Complexity , 2011, European journal of mass spectrometry.

[45]  Lutgarde M. C. Buydens,et al.  Interpretation and Visualization of Non-Linear Data Fusion in Kernel Space: Study on Metabolomic Characterization of Progression of Multiple Sclerosis , 2012, PloS one.

[46]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning , 2008 .

[47]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[48]  Philippe Schmitt-Kopplin,et al.  The chemodiversity of wines can reveal a metabologeography expression of cooperage oak wood , 2009, Proceedings of the National Academy of Sciences.

[49]  Joachim Selbig,et al.  Visualization and analysis of molecular data. , 2007, Methods in molecular biology.

[50]  Fabian J. Theis,et al.  Automated clustering of ICA results for fMRI data analysis , 2005 .

[51]  Chi Bun Ching,et al.  Metabolomic Profiling of Cellular Responses to Carvedilol Enantiomers in Vascular Smooth Muscle Cells , 2010, PloS one.

[52]  Hendrik Blockeel,et al.  Efficient Algorithms for Decision Tree Cross-validation , 2001, J. Mach. Learn. Res..

[53]  Eun-Young Kim,et al.  (1)H NMR-based metabolomic approach for understanding the fermentation behaviors of wine yeast strains. , 2009, Analytical chemistry.

[54]  Franco Moritz,et al.  Molecular cartography in acute Chlamydia pneumoniae infections—a non-targeted metabolomics approach , 2013, Analytical and Bioanalytical Chemistry.

[55]  Philippe Schmitt-Kopplin,et al.  Doping Control Using High and Ultra-High Resolution Mass Spectrometry Based Non-Targeted Metabolomics-A Case Study of Salbutamol and Budesonide Abuse , 2013, PloS one.

[56]  Michel Verleysen,et al.  A data-driven functional projection approach for the selection of feature ranges in spectra with ICA or cluster analysis , 2008, ArXiv.

[57]  Jesús Lozano,et al.  Electronic Nose Based on Independent Component Analysis Combined with Partial Least Squares and Artificial Neural Networks for Wine Prediction , 2012, Sensors.

[58]  Lars Kai Hansen,et al.  How Many Separable Sources? Model Selection In Independent Components Analysis , 2015, PloS one.

[59]  Jeanny Hérault,et al.  Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets , 1997, IEEE Trans. Neural Networks.

[60]  Gaëlle Favé,et al.  Measurement of dietary exposure: a challenging problem which may be overcome thanks to metabolomics? , 2009, Genes & Nutrition.

[61]  Kim-Anh Lê Cao,et al.  Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets , 2012, BMC Bioinformatics.