Sparse network modeling and metscape‐based visualization methods for the analysis of large‐scale metabolomics data

Motivation: Recent technological advances in mass spectrometry, development of richer mass spectral libraries and data processing tools have enabled large scale metabolic profiling. Biological interpretation of metabolomics studies heavily relies on knowledge‐based tools that contain information about metabolic pathways. Incomplete coverage of different areas of metabolism and lack of information about non‐canonical connections between metabolites limits the scope of applications of such tools. Furthermore, the presence of a large number of unknown features, which cannot be readily identified, but nonetheless can represent bona fide compounds, also considerably complicates biological interpretation of the data. Results: Leveraging recent developments in the statistical analysis of high‐dimensional data, we developed a new Debiased Sparse Partial Correlation algorithm (DSPC) for estimating partial correlation networks and implemented it as a Java‐based CorrelationCalculator program. We also introduce a new version of our previously developed tool Metscape that enables building and visualization of correlation networks. We demonstrate the utility of these tools by constructing biologically relevant networks and in aiding identification of unknown compounds. Availability and Implementation: http://metscape.med.umich.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  David J. States,et al.  Bioinformatics Applications Note Databases and Ontologies Metab2mesh: Annotating Compounds with Medical Subject Headings , 2022 .

[2]  P. Mendes,et al.  The origin of correlations in metabolomics data , 2005, Metabolomics.

[3]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[4]  Oliver Fiehn,et al.  Systemic alterations in the metabolome of diabetic NOD mice delineate increased oxidative stress accompanied by reduced inflammation and hypertriglyceremia. , 2015, American journal of physiology. Endocrinology and metabolism.

[5]  P. Karp,et al.  Computational prediction of human metabolic pathways from the complete human genome , 2004, Genome Biology.

[6]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[7]  Karan Uppal,et al.  MetabNet: An R Package for Metabolic Association Analysis of High-Resolution Metabolomics Data , 2015, Front. Bioeng. Biotechnol..

[8]  M. Sugano,et al.  Effect of phosphatidylethanolamine and its constituent base on the metabolism of linoleic acid in rat liver. , 1989, Biochimica et biophysica acta.

[9]  Tony Pawson,et al.  SELPHI: correlation-based identification of kinase-associated networks from global phospho-proteomics data sets , 2015, Nucleic Acids Res..

[10]  Oliver Fiehn,et al.  MetaMapp: mapping and visualizing metabolomic data by integrating information from biochemical pathways and chemical and mass spectral similarity , 2012, BMC Bioinformatics.

[11]  Ronan M. T. Fleming,et al.  A community-driven global reconstruction of human metabolism , 2013, Nature Biotechnology.

[12]  P. Bühlmann,et al.  Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana , 2004, Genome Biology.

[13]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[14]  Robert D. Leclerc Survival of the sparsest: robust gene networks are parsimonious , 2008, Molecular systems biology.

[15]  Bernhard O. Palsson,et al.  A detailed genome-wide reconstruction of mouse metabolism based on human Recon 1 , 2010, BMC Systems Biology.

[16]  O. Demin,et al.  The Edinburgh human metabolic network reconstruction and its functional analysis , 2007, Molecular systems biology.

[17]  Eoin Fahy,et al.  Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools , 2015, Nucleic Acids Res..

[18]  S. Neumann,et al.  CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. , 2012, Analytical chemistry.

[19]  Yiming Zuo,et al.  Biological network inference using low order partial correlation. , 2014, Methods.

[20]  J. Collins,et al.  Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling , 2003, Science.

[21]  Anne-Laure Boulesteix,et al.  Regularized estimation of large-scale gene association networks using graphical Gaussian models , 2009, BMC Bioinformatics.

[22]  Monica L. Mo,et al.  Global reconstruction of the human metabolic network based on genomic and bibliomic data , 2007, Proceedings of the National Academy of Sciences.

[23]  Trevor J. Hastie,et al.  Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso , 2011, J. Mach. Learn. Res..

[24]  Marta Díaz,et al.  AStream: an R package for annotating LC/MS metabolomic data , 2011, Bioinform..

[25]  Kwanjeera Wanichthanarak,et al.  MetaMapR: pathway independent metabolomic network analysis incorporating unknowns , 2015, Bioinform..

[26]  S. Geer,et al.  Confidence intervals for high-dimensional inverse covariance estimation , 2014, 1403.6752.

[27]  Nicola Zamboni,et al.  Defining the metabolome: size, flux, and regulation. , 2015, Molecular cell.

[28]  S. Böcker,et al.  Computational mass spectrometry for metabolomics: Identification of metabolites and small molecules , 2010, Analytical and Bioanalytical Chemistry.

[29]  M. T. Clandinin,et al.  Effect of dietary fat on diabetes-induced changes in liver microsomal fatty acid composition and glucose-6-phosphatase activity in rats , 1991, Lipids.

[30]  Jürgen Kurths,et al.  Observing and Interpreting Correlations in Metabolic Networks , 2003, Bioinform..

[31]  K Sugiyama,et al.  Methionine content of dietary proteins affects the molecular species composition of plasma phosphatidylcholine in rats fed a cholesterol-free diet. , 1997, The Journal of nutrition.

[32]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[33]  Igor Goryanin,et al.  Compartmentalization of the Edinburgh Human Metabolic Network , 2010, BMC Bioinformatics.

[34]  Kieran J. Sharkey,et al.  A novel untargeted metabolomics correlation-based network analysis incorporating human metabolic reconstructions , 2013, BMC Systems Biology.

[35]  Peter D. Karp,et al.  The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases , 2007, Nucleic Acids Res..

[36]  Jianguo Xia,et al.  Metabolomic Data Processing, Analysis, and Interpretation Using MetaboAnalyst , 2011, Current protocols in bioinformatics.

[37]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[38]  David Morganstein,et al.  SWAN: A Multicenter, Multiethnic, Community-Based Cohort Study of Women and the Menopausal Transition , 2000 .

[39]  Terry E. Weymouth,et al.  MetDisease - connecting metabolites to diseases via literature , 2014, Bioinform..

[40]  Fabian J. Theis,et al.  Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data , 2011, BMC Systems Biology.

[41]  Douglas B. Kell,et al.  Automated workflows for accurate mass-based putative metabolite identification in LC/MS-derived metabolomic datasets , 2011, Bioinform..

[42]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[43]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[44]  Giovanni Scardoni,et al.  Metscape 2 bioinformatics tool for the analysis and visualization of metabolomics and gene expression data , 2012, Bioinform..

[45]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[46]  Shuzhao Li,et al.  Predicting Network Activity from High Throughput Metabolomics , 2013, PLoS Comput. Biol..

[47]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[48]  Christian Gieger,et al.  Mining the Unknown: A Systems Approach to Metabolite Identification Combining Genetic and Metabolic Information , 2012, PLoS genetics.