Identifying biologically relevant differences between metagenomic communities

MOTIVATION Metagenomics is the study of genetic material recovered directly from environmental samples. Taxonomic and functional differences between metagenomic samples can highlight the influence of ecological factors on patterns of microbial life in a wide range of habitats. Statistical hypothesis tests can help us distinguish ecological influences from sampling artifacts, but knowledge of only the P-value from a statistical hypothesis test is insufficient to make inferences about biological relevance. Current reporting practices for pairwise comparative metagenomics are inadequate, and better tools are needed for comparative metagenomic analysis. RESULTS We have developed a new software package, STAMP, for comparative metagenomics that supports best practices in analysis and reporting. Examination of a pair of iron mine metagenomes demonstrates that deeper biological insights can be gained using statistical techniques available in our software. An analysis of the functional potential of 'Candidatus Accumulibacter phosphatis' in two enhanced biological phosphorus removal metagenomes identified several subsystems that differ between the A.phosphatis stains in these related communities, including phosphate metabolism, secretion and metal transport. AVAILABILITY Python source code and binaries are freely available from our website at http://kiwi.cs.dal.ca/Software/STAMP CONTACT: beiko@cs.dal.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  B. Manly Randomization, Bootstrap and Monte Carlo Methods in Biology , 2018 .

[3]  Mihai Pop,et al.  Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples , 2009, PLoS Comput. Biol..

[4]  A Agresti,et al.  On Logit Confidence Intervals for the Odds Ratio with Small Samples , 1999, Biometrics.

[5]  W. G. Cochran The $\chi^2$ Test of Goodness of Fit , 1952 .

[6]  Forest Rohwer,et al.  An application of statistics to comparative metagenomics , 2006, BMC Bioinformatics.

[7]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[8]  Michael Y. Galperin,et al.  The genome sequence of the psychrophilic archaeon, Methanococcoides burtonii: the role of genome evolution in cold adaptation , 2009, The ISME Journal.

[9]  R. Newcombe,et al.  Interval estimation for the difference between independent proportions: comparison of eleven methods. , 1998, Statistics in medicine.

[10]  V. Kunin,et al.  A bacterial metapopulation adapts locally to phage predation despite global dispersal. , 2008, Genome research.

[11]  Daniel H. Huson,et al.  Visual and statistical comparison of metagenomes , 2009, Bioinform..

[12]  S. Tringe,et al.  Comparative Metagenomics of Microbial Communities , 2004, Science.

[13]  Florent E. Angly,et al.  Comparative Metagenomics Reveals Host Specific Metavirulomes and Horizontal Gene Transfer Elements in the Chicken Cecum Microbiome , 2008, PloS one.

[14]  James R. Cole,et al.  The Ribosomal Database Project: improved alignments and new tools for rRNA analysis , 2008, Nucleic Acids Res..

[15]  Neil Salkind Encyclopedia of Measurement and Statistics , 2006 .

[16]  Daniel Rokhsar,et al.  Reverse Methanogenesis: Testing the Hypothesis with Environmental Genomics , 2004, Science.

[17]  Rick L. Stevens,et al.  Functional metagenomic profiling of nine biomes , 2008, Nature.

[18]  I-Min A. Chen,et al.  IMG/M: a data management and analysis system for metagenomes , 2007, Nucleic Acids Res..

[19]  K. Nelson,et al.  Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases , 2009, Proceedings of the National Academy of Sciences.

[20]  Patricia C. Babbitt,et al.  Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..

[21]  C. Mehta,et al.  Conditional versus Unconditional Exact Tests for Comparing Two Binomials , 2003 .

[22]  Jonathan J. Deeks,et al.  Down with odds ratios! , 1996, Evidence Based Medicine.

[23]  S. Williams,et al.  Chi-square test for goodness of fit , 2007 .

[24]  E. Mardis,et al.  An obesity-associated gut microbiome with increased capacity for energy harvest , 2006, Nature.

[25]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[26]  Jan O. Korbel,et al.  Quantifying environmental adaptation of metabolic pathways in metagenomics , 2009, Proceedings of the National Academy of Sciences.

[27]  B. Roe,et al.  A core gut microbiome in obese and lean twins , 2008, Nature.

[28]  Folker Meyer,et al.  37. The Metagenomics RAST Server: A Public Resource for the Automatic Phylogenetic and Functional Analysis of Metagenomes , 2011 .

[29]  Joel Dudley,et al.  Bioinformatics software for biologists in the genomics era , 2007, Bioinform..

[30]  Natalia Ivanova,et al.  Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities , 2006, Nature Biotechnology.

[31]  Michael Haber,et al.  A comparison of some conditional and unconditional exact tests for 2x2 contingency tables , 1987 .

[32]  D. Rubin,et al.  Contrasts and Effect Sizes in Behavioral Research: A Correlational Approach , 1999 .

[33]  Robert A. Edwards,et al.  Bacterial carbon processing by generalist species in the coastal ocean , 2008, Nature.

[34]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[35]  J A Eisen,et al.  Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. , 1998, Genome research.

[36]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[37]  R. Knight,et al.  UniFrac: a New Phylogenetic Method for Comparing Microbial Communities , 2005, Applied and Environmental Microbiology.

[38]  G. Barnard Significance tests for 2 X 2 tables. , 1947, Biometrika.

[39]  A. Martín Andrés,et al.  Simplifying the calculation of the P-value for Barnard's test and its derivatives , 1997, Stat. Comput..

[40]  Léon Personnaz,et al.  Enrichment or depletion of a GO category within a class of genes: which test? , 2007, Bioinform..

[41]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[42]  G. Barnard,et al.  On alleged gains in power from lower P-values. , 1989, Statistics in medicine.

[43]  Erik Kristiansson,et al.  ShotgunFunctionalizeR: an R-package for functional comparison of metagenomes , 2009, Bioinform..

[44]  John Ludbrook,et al.  Analysis of 2 x 2 tables of frequencies: matching test to experimental design. , 2008, International journal of epidemiology.

[45]  Dewesh Agrawal,et al.  Inappropriate Interpretation of the Odds Ratio: Oddly Not That Uncommon , 2005, Pediatrics.

[46]  R. Knight,et al.  Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. , 2009, Genome research.

[47]  I. Cuthill,et al.  Effect size, confidence interval and statistical significance: a practical guide for biologists , 2007, Biological reviews of the Cambridge Philosophical Society.

[48]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Mary Ann Moran,et al.  Comparative day/night metatranscriptomic analysis of microbial communities in the North Pacific subtropical gyre. , 2009, Environmental microbiology.

[50]  C. Garvan,et al.  Proportions, odds, and risk. , 2004, Radiology.

[51]  A. Agresti [A Survey of Exact Inference for Contingency Tables]: Rejoinder , 1992 .

[52]  Forest Rohwer,et al.  Metagenomic Analysis of Respiratory Tract DNA Viral Communities in Cystic Fibrosis and Non-Cystic Fibrosis Individuals , 2009, PloS one.

[53]  M. Breitbart,et al.  Using pyrosequencing to shed light on deep mine microbial ecology , 2006, BMC Genomics.

[54]  P. Turnbaugh,et al.  Microbial ecology: Human gut microbes associated with obesity , 2006, Nature.

[55]  Benjamin J. Raphael,et al.  The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families , 2007, PLoS biology.

[56]  E. Koonin,et al.  Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. , 2000, Science.

[57]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[58]  Raef Lawson,et al.  Small Sample Confidence Intervals for the Odds Ratio , 2004 .

[59]  Florent E. Angly,et al.  Microbial Ecology of Four Coral Atolls in the Northern Line Islands , 2008, PloS one.

[60]  Iddo Friedberg,et al.  Automated protein function predictionçthe genomic challenge , 2006 .

[61]  Daniel H. Huson,et al.  Simultaneous Assessment of Soil Microbial Community Structure and Function through Analysis of the Meta-Transcriptome , 2008, PloS one.

[62]  A. W. Kemp,et al.  Randomization, Bootstrap and Monte Carlo Methods in Biology , 1997 .

[63]  D. Altman,et al.  The odds ratio , 2000, BMJ : British Medical Journal.