Current opportunities and challenges in microbial metagenome analysis—a bioinformatic perspective

Metagenomics has become an indispensable tool for studying the diversity and metabolic potential of environmental microbes, whose bulk is as yet non-cultivable. Continual progress in next-generation sequencing allows for generating increasingly large metagenomes and studying multiple metagenomes over time or space. Recently, a new type of holistic ecosystem study has emerged that seeks to combine metagenomics with biodiversity, meta-expression and contextual data. Such ‘ecosystems biology’ approaches bear the potential to not only advance our understanding of environmental microbes to a new level but also impose challenges due to increasing data complexities, in particular with respect to bioinformatic post-processing. This mini review aims to address selected opportunities and challenges of modern metagenomics from a bioinformatics perspective and hopefully will serve as a useful resource for microbial ecologists and bioinformaticians alike.

[1]  Heribert Cypionka,et al.  Effect of Signal Compounds and Incubation Conditions on the Culturability of Freshwater Bacterioplankton , 2003, Applied and Environmental Microbiology.

[2]  A. Hsu,et al.  Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing , 2007, Journal of biomedicine & biotechnology.

[3]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[4]  Sallie W. Chisholm,et al.  Unlocking Short Read Sequencing for Metagenomics , 2010, PloS one.

[5]  J. Handelsman,et al.  Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. , 1998, Chemistry & biology.

[6]  Katharina J. Hoff,et al.  Orphelia: predicting genes in metagenomic sequencing reads , 2009, Nucleic Acids Res..

[7]  Jo Handelsman,et al.  Metagenomics or Megagenomics? , 2005, Nature Reviews Microbiology.

[8]  Heribert Cypionka,et al.  Microbial Diversity in Coastal Subsurface Sediments: a Cultivation Approach Using Various Electron Acceptors and Substrate Gradients , 2005, Applied and Environmental Microbiology.

[9]  Katharina J Hoff,et al.  The effect of sequencing errors on metagenomic gene prediction , 2009, BMC Genomics.

[10]  Alexandre Lomsadze,et al.  Frameshift detection in prokaryotic genomic sequences , 2009, Int. J. Bioinform. Res. Appl..

[11]  I-Min A. Chen,et al.  IMG/M: the integrated metagenome data management and comparative analysis system , 2011, Nucleic Acids Res..

[12]  M. Kanehisa,et al.  The KEGG databases and tools facilitating omics analysis: latest developments involving human diseases and pharmaceuticals. , 2012, Methods in molecular biology.

[13]  Uta Bohnebeck,et al.  PhyloGena - a user-friendly system for automated phylogenetic annotation of unknown sequences , 2007, Bioinform..

[14]  Andreas Wilke,et al.  Using clouds for metagenomics: A case study , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[15]  Jo Handelsman,et al.  Metagenomics for studying unculturable microorganisms: cutting the Gordian knot , 2005, Genome Biology.

[16]  Daphne Koller,et al.  Genovo: De Novo Assembly for Metagenomes , 2010, RECOMB.

[17]  R. Amann,et al.  Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics , 2011, The ISME Journal.

[18]  M. Blaser,et al.  Evolutionary implications of microbial genome tetranucleotide frequency biases. , 2003, Genome research.

[19]  J. Lennon,et al.  Replication, lies and lesser-known truths regarding experimental design in environmental microbiology. , 2011, Environmental microbiology.

[20]  Theodore D. Liakopoulos,et al.  A novel tool for the prediction of transmembrane protein topology based on a statistical analysis of the SwissProt database: the OrienTM algorithm. , 2001, Protein engineering.

[21]  R. Overbeek,et al.  Missing genes in metabolic pathways: a comparative genomics approach. , 2003, Current opinion in chemical biology.

[22]  Fabian Schreiber,et al.  CoMet—a web server for comparative functional profiling of metagenomes , 2011, Nucleic Acids Res..

[23]  Zhaojun Bai,et al.  CompostBin: A DNA Composition-Based Algorithm for Binning Environmental Shotgun Reads , 2007, RECOMB.

[24]  N. Pace,et al.  Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[25]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[26]  Eran Halperin,et al.  Joint Analysis of Multiple Metagenomic Samples , 2012, PLoS Comput. Biol..

[27]  S. Salzberg,et al.  Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models , 2009, Nature Methods.

[28]  Hideaki Tanaka,et al.  MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads , 2011, BCB '11.

[29]  A Danchin,et al.  Oligonucleotide bias in Bacillus subtilis: general trends and taxonomic comparisons. , 1998, Nucleic acids research.

[30]  J. Eisen,et al.  Assembling the Marine Metagenome, One Cell at a Time , 2009, PloS one.

[31]  Saman K. Halgamuge,et al.  BMC Bioinformatics BioMed Central Methodology article Binning sequences using very sparse labels within a metagenome , 2008 .

[32]  Ke Wang,et al.  PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria , 2003, Nucleic Acids Res..

[33]  R. Morris,et al.  Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota , 2012, Science.

[34]  Naryttza N. Diaz,et al.  TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach , 2009, BMC Bioinformatics.

[35]  N. Kyrpides,et al.  Individual genome assembly from complex community short-read metagenomic datasets , 2011, The ISME Journal.

[36]  Hideaki Sugawara,et al.  Novel phylogenetic studies of genomic sequence fragments derived from uncultured microbe mixtures in environmental and clinical samples. , 2005, DNA research : an international journal for rapid publication of reports on genes and genomes.

[37]  Johannes Goll,et al.  Bioinformatics Applications Note Database and Ontologies Metarep: Jcvi Metagenomics Reports—an Open Source Tool for High-performance Comparative Metagenomics , 2022 .

[38]  Brian P. Thompson,et al.  Capturing Single Cell Genomes of Active Polysaccharide Degraders: An Unexpected Contribution of Verrucomicrobia , 2012, PloS one.

[39]  Yu-Wei Wu,et al.  A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l-Tuples , 2010, RECOMB.

[40]  Naryttza N. Diaz,et al.  Phylogenetic classification of short environmental DNA fragments , 2008, Nucleic acids research.

[41]  Alla Lapidus,et al.  A Bioinformatician's Guide to Metagenomics , 2008, Microbiology and Molecular Biology Reviews.

[42]  E. Uberbacher,et al.  CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database. , 2010, Glycobiology.

[43]  S. Kravitz,et al.  CAMERA: A Community Resource for Metagenomics , 2007, PLoS biology.

[44]  I. Rigoutsos,et al.  Accurate phylogenetic classification of variable-length DNA fragments , 2007, Nature Methods.

[45]  D. Antonopoulos,et al.  Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. , 2010, Cold Spring Harbor protocols.

[46]  Renzo Kottmann,et al.  A standard MIGS/MIMS compliant XML Schema: toward the development of the Genomic Contextual Data Markup Language (GCDML). , 2008, Omics : a journal of integrative biology.

[47]  Mark Borodovsky,et al.  Genetack: frameshift Identification in protein-Coding Sequences by the Viterbi Algorithm , 2010, J. Bioinform. Comput. Biol..

[48]  J. Handelsman,et al.  Metagenomics: genomic analysis of microbial communities. , 2004, Annual review of genetics.

[49]  I. Saeed,et al.  Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition , 2011, Nucleic acids research.

[50]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[51]  Rick L. Stevens,et al.  The Earth Microbiome Project: Meeting report of the “1st EMP meeting on sample selection and acquisition” at Argonne National Laboratory October 6th 2010. , 2010, Standards in genomic sciences.

[52]  Nikos Kyrpides,et al.  Genomes OnLine Database (GOLD): a monitor of genome projects world-wide , 2001, Nucleic Acids Res..

[53]  J. Stoye,et al.  Taxonomic classification of metagenomic shotgun sequences with CARMA3 , 2011, Nucleic acids research.

[54]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[55]  Nikos Kyrpides,et al.  Genomes OnLine Database (GOLD 1.0): a monitor of complete and ongoing genome projects world-wide , 1999, Bioinform..

[56]  L. Koski,et al.  The Closest BLAST Hit Is Often Not the Nearest Neighbor , 2001, Journal of Molecular Evolution.

[57]  Shigehiko Kanaya,et al.  Informatics for unveiling hidden genome signatures. , 2003, Genome research.

[58]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[59]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[60]  Weizhong Li,et al.  Analysis and comparison of very large metagenomes with fast clustering and functional annotation , 2009, BMC Bioinformatics.

[61]  Werner Liesack,et al.  Genome of Rice Cluster I Archaea—the Key Methane Producers in the Rice Rhizosphere , 2006, Science.

[62]  Georges N. Cohen,et al.  “Candidatus Cloacamonas Acidaminovorans”: Genome Sequence Reconstruction Provides a First Glimpse of a New Bacterial Division , 2008, Journal of bacteriology.

[63]  Jacque C. Young,et al.  Metaproteomics of a gutless marine worm and its symbiotic microbial community reveal unusual pathways for carbon and energy use , 2012, Proceedings of the National Academy of Sciences.

[64]  R. Sandberg,et al.  Capturing whole-genome characteristics in short sequences using a naïve Bayesian classifier. , 2001, Genome research.

[65]  Susan M. Huse,et al.  Microbial diversity in the deep sea and the underexplored “rare biosphere” , 2006, Proceedings of the National Academy of Sciences.

[66]  Gajendra P. S. Raghava,et al.  PSLpred: prediction of subcellular localization of bacterial proteins , 2005, Bioinform..

[67]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Frank Oliver Glöckner,et al.  Fine-scale evolution: genomic, phenotypic and ecological differentiation in two coexisting Salinibacter ruber strains , 2010, The ISME Journal.

[69]  Peter Meinicke,et al.  Mixture models for analysis of the taxonomic composition of metagenomes , 2011, Bioinform..

[70]  K. Schleifer,et al.  Phylogenetic identification and in situ detection of individual microbial cells without cultivation. , 1995, Microbiological reviews.

[71]  Renzo Kottmann,et al.  Habitat-Lite: a GSC case study based on free text terms for environmental metadata. , 2008, Omics : a journal of integrative biology.

[72]  I-Min A. Chen,et al.  IMG/M: a data management and analysis system for metagenomes , 2007, Nucleic Acids Res..

[73]  R. Amann,et al.  Application of tetranucleotide frequencies for the assignment of genomic fragments. , 2004, Environmental microbiology.

[74]  Julian Parkhill,et al.  Single-cell genomics , 2008, Nature Reviews Microbiology.

[75]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[76]  Siu-Ming Yiu,et al.  MetaCluster: unsupervised binning of environmental genomic fragments and taxonomic annotation , 2010, BCB '10.

[77]  J. Prosser Replicate or lie. , 2010, Environmental microbiology.

[78]  Min Zhao,et al.  TSdb: A database of transporter substrates linking metabolic pathways and transporter systems on a genome scale via their shared substrates , 2011, Science China Life Sciences.

[79]  Sitao Wu,et al.  WebMGA: a customizable web server for fast metagenomic sequence analysis , 2011, BMC Genomics.

[80]  Raymond K. Auerbach,et al.  The real cost of sequencing: higher than you think! , 2011, Genome Biology.

[81]  Susan M. Huse,et al.  Ironing out the wrinkles in the rare biosphere through improved OTU clustering , 2010, Environmental microbiology.

[82]  Pelin Yilmaz,et al.  The genomic standards consortium: bringing standards to life for microbial ecology , 2011, The ISME Journal.

[83]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[84]  Jed Fuhrman,et al.  Faculty Opinions recommendation of IMG/M: the integrated metagenome data management and comparative analysis system. , 2012 .

[85]  S. Tringe,et al.  Metagenomic Discovery of Biomass-Degrading Genes and Genomes from Cow Rumen , 2011, Science.

[86]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[87]  Charles Elkan,et al.  The Transporter Classification Database: recent advances , 2008, Nucleic Acids Res..

[88]  F. Glöckner,et al.  Ecological structuring of bacterial and archaeal taxa in surface ocean waters. , 2012, FEMS microbiology ecology.

[89]  J. V. van Elsas,et al.  The great screen anomaly—a new frontier in product discovery through functional metagenomics , 2011, Applied Microbiology and Biotechnology.

[90]  Natalia N. Ivanova,et al.  A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea , 2009, Nature.

[91]  P. Turnbaugh,et al.  An Invitation to the Marriage of Metagenomics and Metabolomics , 2008, Cell.

[92]  K. Borzym,et al.  Complete genome sequence of the marine planctomycete Pirellula sp. strain 1 , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[93]  Brian C. Thomas,et al.  Community-wide analysis of microbial genome sequence signatures , 2009, Genome Biology.

[94]  E. Delong,et al.  Integrated metatranscriptomic and metagenomic analyses of stratified microbial assemblages in the open ocean , 2011, The ISME Journal.

[95]  Milton H. Saier,et al.  TCDB: the Transporter Classification Database for membrane transport protein analyses and information , 2005, Nucleic Acids Res..

[96]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[97]  Christopher Quince,et al.  The rational exploration of microbial diversity , 2008, The ISME Journal.

[98]  Jenn-Kang Hwang,et al.  Predicting subcellular localization of proteins for Gram‐negative bacteria by support vector machines based on n‐peptide compositions , 2004, Protein science : a publication of the Protein Society.

[99]  Siu-Ming Yiu,et al.  Meta-IDBA: a de Novo assembler for metagenomic data , 2011, Bioinform..

[100]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[101]  Emily S. Charlson,et al.  Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications , 2011, Nature Biotechnology.

[102]  Sergey Koren,et al.  MetAMOS: a metagenomic assembly and analysis pipeline for AMOS , 2011, Genome Biology.

[103]  Chris F. Taylor,et al.  The minimum information about a genome sequence (MIGS) specification , 2008, Nature Biotechnology.

[104]  Andreas Wilke,et al.  Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG , 2011, BMC Bioinformatics.

[105]  Gregory Kucherov,et al.  mreps: efficient and flexible detection of tandem repeats in DNA , 2003, Nucleic Acids Res..

[106]  G. Cochrane,et al.  The Genomic Standards Consortium , 2011, PLoS biology.

[107]  Naryttza N. Diaz,et al.  Hyperbolic SOM-based clustering of DNA fragment features for taxonomic visualization and classification , 2008, Bioinform..

[108]  N. Kyrpides,et al.  Direct Comparisons of Illumina vs. Roche 454 Sequencing Technologies on the Same Microbial Community DNA Sample , 2012, PloS one.

[109]  Terry Gaasterland,et al.  DarkHorse: a method for genome-wide prediction of horizontal gene transfer , 2007, Genome Biology.

[110]  T. Takagi,et al.  MetaGene: prokaryotic gene finding from environmental genome shotgun sequences , 2006, Nucleic acids research.

[111]  Jing Chen,et al.  Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource , 2010, Nucleic Acids Res..

[112]  Jo Handelsman,et al.  A statistical toolbox for metagenomics: assessing functional diversity in microbial communities , 2008, BMC Bioinformatics.

[113]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[114]  T. Itoh,et al.  MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage Genomes , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[115]  Gustavo Stolovitzky,et al.  Characterizing and controlling the motion of ssDNA in a solid-state nanopore. , 2011, Biophysical journal.

[116]  Oleg N. Reva,et al.  Global features of sequences of bacterial chromosomes, plasmids and phages revealed by analysis of oligonucleotide usage patterns , 2004, BMC Bioinformatics.

[117]  Sergey Koren,et al.  Bambus 2: scaffolding metagenomes , 2011, Bioinform..

[118]  Yang Li,et al.  A de novo metagenomic assembly program for shotgun DNA reads , 2012, Bioinform..

[119]  Xiangjun Liu,et al.  GNBSL: A new integrative system to predict the subcellular location for Gram‐negative bacteria proteins , 2006, Proteomics.

[120]  Natalia N. Ivanova,et al.  Symbiosis insights through metagenomic analysis of a microbial consortium. , 2006, Nature Reviews Microbiology.

[121]  C. Quince,et al.  Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics , 2012, PloS one.

[122]  Katharina J. Hoff,et al.  BMC Bioinformatics BioMed Central Methodology article Gene prediction in metagenomic fragments: A large scale machine , 2008 .

[123]  Robert G. Beiko,et al.  Identifying biologically relevant differences between metagenomic communities , 2010, Bioinform..

[124]  Siu-Ming Yiu,et al.  A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio , 2011, Bioinform..

[125]  Fabian Schreiber,et al.  Treephyler: fast taxonomic profiling of metagenomes , 2010, Bioinform..

[126]  Frank Oliver Glöckner,et al.  Unveiling microbial life in the new deep-sea hypersaline Lake Thetis. Part II: a metagenomic study. , 2012, Environmental microbiology.

[127]  R. Amann,et al.  Metagenome and mRNA expression analyses of anaerobic methanotrophic archaea of the ANME-1 group. , 2010, Environmental microbiology.

[128]  Thomas P. Curtis,et al.  Estimating prokaryotic diversity and its limits , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[129]  Rick L. Stevens,et al.  Meeting Report: The Terabase Metagenomics Workshop and the Vision of an Earth Microbiome Project , 2010, Standards in genomic sciences.

[130]  Jonathan Dushoff,et al.  Unsupervised statistical clustering of environmental shotgun sequences , 2009, BMC Bioinformatics.

[131]  P. Sassone-Corsi,et al.  Computational Improvements Reveal Great Bacterial Diversity and High Metal Toxicity in Soil , 2022 .

[132]  Nikos Kyrpides,et al.  CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats , 2007, BMC Bioinformatics.

[133]  Michelle G. Giglio,et al.  TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes , 2006, Nucleic Acids Res..

[134]  Inna Dubchak,et al.  An experimental metagenome data management and analysis system , 2006, ISMB.

[135]  S. Brunak,et al.  Locating proteins in the cell using TargetP, SignalP and related tools , 2007, Nature Protocols.

[136]  R. Daniel,et al.  Metagenomic Analyses: Past and Future Trends , 2010, Applied and Environmental Microbiology.

[137]  S. Hurlbert Pseudoreplication and the Design of Ecological Field Experiments , 1984 .

[138]  Wolfgang Gerlach,et al.  WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads , 2009, BMC Bioinformatics.

[139]  Stéphane Avner,et al.  CoBaltDB: Complete bacterial and archaeal orfeomes subcellular localization database and associated resources , 2010, BMC Microbiology.

[140]  R. Amann,et al.  Substrate-Controlled Succession of Marine Bacterioplankton Populations Induced by a Phytoplankton Bloom , 2012, Science.