Visual and statistical comparison of metagenomes

BACKGROUND Metagenomics is the study of the genomic content of an environmental sample of microbes. Advances in the through-put and cost-efficiency of sequencing technology is fueling a rapid increase in the number and size of metagenomic datasets being generated. Bioinformatics is faced with the problem of how to handle and analyze these datasets in an efficient and useful way. One goal of these metagenomic studies is to get a basic understanding of the microbial world both surrounding us and within us. One major challenge is how to compare multiple datasets. Furthermore, there is a need for bioinformatics tools that can process many large datasets and are easy to use. RESULTS This article describes two new and helpful techniques for comparing multiple metagenomic datasets. The first is a visualization technique for multiple datasets and the second is a new statistical method for highlighting the differences in a pairwise comparison. We have developed implementations of both methods that are suitable for very large datasets and provide these in Version 3 of our standalone metagenome analysis tool MEGAN. CONCLUSION These new methods are suitable for the visual comparison of many large metagenomes and the statistical comparison of two metagenomes at a time. Nevertheless, more work needs to be done to support the comparative analysis of multiple metagenome datasets. AVAILABILITY Version 3 of MEGAN, which implements all ideas presented in this article, can be obtained from our web site at: www-ab.informatik.uni-tuebingen.de/software/megan. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  R. B. Jackson,et al.  Toward an ecological classification of soil bacteria. , 2007, Ecology.

[2]  Rob Knight,et al.  UniFrac – An online tool for comparing microbial community diversity in a phylogenetic context , 2006, BMC Bioinformatics.

[3]  Frank Oliver Glöckner,et al.  TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences , 2004, BMC Bioinformatics.

[4]  S. Kravitz,et al.  CAMERA: A Community Resource for Metagenomics , 2007, PLoS biology.

[5]  Mihai Pop,et al.  Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples , 2009, PLoS Comput. Biol..

[6]  J. Handelsman,et al.  Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. , 1998, Chemistry & biology.

[7]  Alexander F. Auch,et al.  Metagenomics to Paleogenomics: Large-Scale Sequencing of Mammoth DNA , 2006, Science.

[8]  I-Min A. Chen,et al.  IMG/M: a data management and analysis system for metagenomes , 2007, Nucleic Acids Res..

[9]  Benjamin J. Raphael,et al.  The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families , 2007, PLoS biology.

[10]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[11]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[12]  S. Tringe,et al.  Comparative Metagenomics of Microbial Communities , 2004, Science.

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[15]  Yeting Zhang,et al.  The mitochondrial genome sequence of the Tasmanian tiger (Thylacinus cynocephalus). , 2008, Genome research.

[16]  Li Deng,et al.  Differential expression in SAGE: accounting for normal between-library variation , 2003, Bioinform..

[17]  Inna Dubchak,et al.  The integrated microbial genomes (IMG) system , 2005, Nucleic Acids Res..

[18]  E. Mardis,et al.  An obesity-associated gut microbiome with increased capacity for energy harvest , 2006, Nature.

[19]  S. Tringe,et al.  Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments , 2007, Science.

[20]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[21]  Naryttza N. Diaz,et al.  Phylogenetic classification of short environmental DNA fragments , 2008, Nucleic acids research.

[22]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[23]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[24]  Ying He,et al.  Signature, a web server for taxonomic characterization of sequence samples using signature genes , 2008, Nucleic Acids Res..

[25]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[26]  Nikos Kyrpides,et al.  Genomes OnLine Database (GOLD): a monitor of genome projects world-wide , 2001, Nucleic Acids Res..

[27]  Jun Lu,et al.  BMC Bioinformatics BioMed Central Methodology article Identifying differential expression in multiple SAGE libraries: an , 2005 .

[28]  I. Rigoutsos,et al.  Accurate phylogenetic classification of variable-length DNA fragments , 2007, Nature Methods.

[29]  A. Halpern,et al.  The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific , 2007, PLoS biology.