Constructing Metabolic Association Networks Using High-dimensional Mass Spectrometry Data.

The goal of metabolic association networks is to identify topology of a metabolic network for a better understanding of molecular mechanisms. An accurate metabolic association network enables investigation of the functional behavior of metabolites in a cell or tissue. Gaussian Graphical model (GGM)-based methods have been widely used in genomics to infer biological networks. However, the performance of various GGM-based methods for the construction of metabolic association networks remains unknown in metabolomics. The performance of principle component regression (PCR), independent component regression (ICR), shrinkage covariance estimate (SCE), partial least squares regression (PLSR), and extrinsic similarity (ES) methods in constructing metabolic association networks was compared by estimating partial correlation coefficient matrices when the number of variables is larger than the sample size. To do this, the sample size and the network density (complexity) were considered as variables for network construction. Simulation studies show that PCR and ICR are more stable to the sample size and the network density than SCE and PLSR in terms of F1 scores. These methods were further applied to analysis of experimental metabolomics data acquired from metabolite extract of mouse liver. For the simulated data, the proposed methods PCR and ICR outperform other methods when the network density is large, while PLSR and SCE perform better when the network density is small. As for experimental metabolomics data, PCR and ICR discover more significant edges and perform better than PLSR and SCE when the discovered edges are evaluated using KEGG pathway. These results suggest that the metabolic network is more complex than the genomic network and therefore, PCR and ICR have the advantage over PLSR and SCE in constructing the metabolic association networks.

[1]  A. Clark,et al.  Evolutionary constraint and adaptation in the metabolic network of Drosophila. , 2008, Molecular biology and evolution.

[2]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[3]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[4]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[5]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[6]  Jun Zhang,et al.  MetSign: a computational platform for high-resolution mass spectrometry-based metabolomics. , 2011, Analytical chemistry.

[7]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[8]  Fabian J. Theis,et al.  Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data , 2011, BMC Systems Biology.

[9]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[10]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[11]  Z. Ramadan,et al.  Metabolic profiling using principal component analysis, discriminant partial least squares, and genetic algorithms. , 2006, Talanta.

[12]  Anne-Laure Boulesteix,et al.  Regularized estimation of large-scale gene association networks using graphical Gaussian models , 2009, BMC Bioinformatics.

[13]  James V. Stone Independent Component Analysis: A Tutorial Introduction , 2007 .

[14]  Bruce D. Hammock,et al.  Metabolomics: building on a century of biochemistry to guide human health , 2005, Metabolomics.

[15]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[16]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[17]  Bruce A. Draper,et al.  PCA vs. ICA: A Comparison on the FERET Data Set , 2002, JCIS.

[18]  J. D. Morrison,et al.  Computer methods in analytical mass spectrometry. Development of programs for analysis of low-resolution mass spectra , 1971 .

[19]  Hiromasa Kaneko,et al.  Development of a New Regression Analysis Method Using Independent Component Analysis , 2008, J. Chem. Inf. Model..

[20]  D. Kliebenstein,et al.  The Complex Genetic Architecture of the Metabolome , 2010, PLoS genetics.

[21]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[22]  Matej Oresic,et al.  Normalization method for metabolomics data using optimal selection of multiple internal standards , 2007, BMC Bioinformatics.

[23]  Vladimir Batagelj,et al.  Pajek - Program for Large Network Analysis , 1999 .

[24]  Srinivasan Parthasarathy,et al.  Mutual Information Based Extrinsic Similarity for Microarray Analysis , 2009, BICoB.

[25]  Yury Tikunov,et al.  A correlation network approach to metabolic data analysis for tomato fruits , 2008, Euphytica.

[26]  Ozgur Yeniay,et al.  A comparison of partial least squares regression with other prediction methods , 2001 .

[27]  P. Comon Independent Component Analysis , 1992 .

[28]  R. Kielbasa,et al.  Comparison of three different methods to model the semiconductor manufacturing yield , 2005, IEEE/SEMI Conference and Workshop on Advanced Semiconductor Manufacturing 2005..

[29]  Korbinian Strimmer,et al.  A unified approach to false discovery rate estimation , 2008, BMC Bioinformatics.

[30]  J. Ross,et al.  A Test Case of Correlation Metric Construction of a Reaction Pathway from Measurements , 1997 .

[31]  Jessika Weiss,et al.  Graphical Models In Applied Multivariate Statistics , 2016 .

[32]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[33]  Vincent Frouin,et al.  Gene Association Networks from Microarray Data Using a Regularized Estimation of Partial Correlation Based on PLS Regression , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[34]  Jürgen Kurths,et al.  Observing and Interpreting Correlations in Metabolic Networks , 2003, Bioinform..

[35]  Heikki Mannila,et al.  Similarity of Attributes by External Probes , 1998, KDD.

[36]  Sophie Lambert-Lacroix,et al.  Effective dimension reduction methods for tumor classification using gene expression data , 2003, Bioinform..

[37]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[38]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[39]  Peter D. Wentzell,et al.  Comparison of principal components regression and partial least squares regression through generic simulations of complex mixtures , 2003 .

[40]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[41]  Vasyl Pihur,et al.  Reconstruction of genetic association networks from microarray data: a partial least squares approach , 2008, Bioinform..