MetaPath: identifying differentially abundant metabolic pathways in metagenomic datasets

BackgroundEnabled by rapid advances in sequencing technology, metagenomic studies aim to characterize entire communities of microbes bypassing the need for culturing individual bacterial members. One major goal of metagenomic studies is to identify specific functional adaptations of microbial communities to their habitats. The functional profile and the abundances for a sample can be estimated by mapping metagenomic sequences to the global metabolic network consisting of thousands of molecular reactions. Here we describe a powerful analytical method (MetaPath) that can identify differentially abundant pathways in metagenomic datasets, relying on a combination of metagenomic sequence data and prior metabolic pathway knowledge.MethodsFirst, we introduce a scoring function for an arbitrary subnetwork and find the max-weight subnetwork in the global network by a greedy search algorithm. Then we compute two p values (pabund and pstruct) using nonparametric approaches to answer two different statistical questions: (1) is this subnetwork differentically abundant? (2) What is the probability of finding such good subnetworks by chance given the data and network structure? Finally, significant metabolic subnetworks are discovered based on these two p values.ResultsIn order to validate our methods, we have designed a simulated metabolic pathways dataset and show that MetaPath outperforms other commonly used approaches. We also demonstrate the power of our methods in analyzing two publicly available metagenomic datasets, and show that the subnetworks identified by MetaPath provide valuable insights into the biological activities of the microbiome.ConclusionsWe have introduced a statistical method for finding significant metabolic subnetworks from metagenomic datasets. Compared with previous methods, results from MetaPath are more robust against noise in the data, and have significantly higher sensitivity and specificity (when tested on simulated datasets). When applied to two publicly available metagenomic datasets, the output of MetaPath is consistent with previous observations and also provides several new insights into the metabolic activity of the gut microbiome. The software is freely available at http://metapath.cbcb.umd.edu.

[1]  R. Eckel,et al.  Obesity and heart disease: a statement for healthcare professionals from the Nutrition Committee, American Heart Association. , 1997, Circulation.

[2]  E. Koonin,et al.  Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. , 2000, Science.

[3]  F. Borson‐Chazot,et al.  Occurrence of hyperhomocysteinemia 1 year after gastroplasty for severe obesity. , 1999, The Journal of clinical endocrinology and metabolism.

[4]  J. Handelsman,et al.  Metagenomics: genomic analysis of microbial communities. , 2004, Annual review of genetics.

[5]  E. Mardis,et al.  An obesity-associated gut microbiome with increased capacity for energy harvest , 2006, Nature.

[6]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Jan O. Korbel,et al.  Quantifying environmental adaptation of metabolic pathways in metagenomics , 2009, Proceedings of the National Academy of Sciences.

[8]  Tobias Müller,et al.  Identifying functional modules in protein–protein interaction networks: an integrated exact approach , 2008, ISMB.

[9]  S. Hirsch,et al.  Serum folate and homocysteine levels in obese females with non-alcoholic fatty liver. , 2005, Nutrition.

[10]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[11]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[12]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[13]  Tao Yu,et al.  Optimum Distribution of Resources Based on Particle Swarm Optimization and Complex Network Theory , 2010, LSMS/ICSEE.

[14]  A. Schaafsma,et al.  Plasma total homocysteine increases from day 20 to 40 in breastfed but not formula‐fed low‐birthweight infants , 2002, Acta paediatrica.

[15]  Ron Y. Pinter,et al.  A Statistical Framework for the Functional Analysis of Metagenomes , 2008, RECOMB.

[16]  Forest Rohwer,et al.  An application of statistics to comparative metagenomics , 2006, BMC Bioinformatics.

[17]  H. Mangge,et al.  Insulin is an independent correlate of plasma homocysteine levels in obese children and adolescents. , 2000, Diabetes care.

[18]  A. Tungtrongchitr,et al.  Serum homocysteine, B12 and folic acid concentration in Thai overweight and obese subjects. , 2003, International journal for vitamin and nutrition research. Internationale Zeitschrift fur Vitamin- und Ernahrungsforschung. Journal international de vitaminologie et de nutrition.

[19]  Mihai Pop,et al.  Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples , 2009, PLoS Comput. Biol..

[20]  S. Tringe,et al.  Comparative Metagenomics of Microbial Communities , 2004, Science.

[21]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[22]  B. Roe,et al.  A core gut microbiome in obese and lean twins , 2008, Nature.

[23]  Hiroshi Mori,et al.  Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes , 2007, DNA research : an international journal for rapid publication of reports on genes and genomes.

[24]  R. Mojtabai Body mass index and serum folate in childbearing age women , 2004, European Journal of Epidemiology.

[25]  Mihai Pop,et al.  Identifying Differentially Abundant Metabolic Pathways in Metagenomic Datasets , 2010, ISBRA.