A Review of Bioinformatics Tools for Bio-Prospecting from Metagenomic Sequence Data

The microbiome can be defined as the community of microorganisms that live in a particular environment. Metagenomics is the practice of sequencing DNA from the genomes of all organisms present in a particular sample, and has become a common method for the study of microbiome population structure and function. Increasingly, researchers are finding novel genes encoded within metagenomes, many of which may be of interest to the biotechnology and pharmaceutical industries. However, such “bioprospecting” requires a suite of sophisticated bioinformatics tools to make sense of the data. This review summarizes the most commonly used bioinformatics tools for the assembly and annotation of metagenomic sequence data with the aim of discovering novel genes.

[1]  David R. Kelley,et al.  Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering , 2011, Nucleic acids research.

[2]  M. Strous,et al.  The Binning of Metagenomic Contigs for Microbial Physiology of Mixed Cultures , 2012, Front. Microbio..

[3]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[4]  Steven Salzberg,et al.  Clustering metagenomic sequences with interpolated Markov models , 2010, BMC Bioinformatics.

[5]  Benedict Paten,et al.  Improved data analysis for the MinION nanopore sequencer , 2015, Nature Methods.

[6]  Yasubumi Sakakibara,et al.  MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads , 2012, Nucleic acids research.

[7]  F. Raymond,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Ray Meta: scalable de novo metagenome assembly and profiling , 2012 .

[8]  Anders F. Andersson,et al.  Binning metagenomic contigs by coverage and composition , 2014, Nature Methods.

[9]  I-Min A. Chen,et al.  IMG/M: a data management and analysis system for metagenomes , 2007, Nucleic Acids Res..

[10]  Rick L. Stevens,et al.  The RAST Server: Rapid Annotations using Subsystems Technology , 2008, BMC Genomics.

[11]  T. Takagi,et al.  MetaGene: prokaryotic gene finding from environmental genome shotgun sequences , 2006, Nucleic acids research.

[12]  Sergey Koren,et al.  Bambus 2: scaffolding metagenomes , 2011, Bioinform..

[13]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[14]  Robin Haw,et al.  Using the Reactome Database , 2012, Current protocols in bioinformatics.

[15]  Hideaki Tanaka,et al.  MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads , 2011, BCB '11.

[16]  T. Itoh,et al.  MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage Genomes , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[17]  David A. Eccles,et al.  MinION Analysis and Reference Consortium: Phase 1 data release and analysis , 2015, F1000Research.

[18]  Kunihiko Sadakane,et al.  MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph , 2014, Bioinform..

[19]  Siu-Ming Yiu,et al.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth , 2012, Bioinform..

[20]  Huaiyu Mi,et al.  The InterPro protein families database: the classification resource after 15 years , 2014, Nucleic Acids Res..

[21]  Gautier Koscielny,et al.  Ensembl 2012 , 2011, Nucleic Acids Res..

[22]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[23]  I-Min A. Chen,et al.  IMG/M 4 version of the integrated metagenome comparative analysis system , 2013, Nucleic Acids Res..

[24]  Kunihiko Sadakane,et al.  Succinct de Bruijn Graphs , 2012, WABI.

[25]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[26]  Mick Watson,et al.  Bovine Host Genetic Variation Influences Rumen Microbial Methane Production with Best Selection Criterion for Low Methane Emitting and Efficiently Feed Converting Hosts Based on Metagenomic Gene Abundance , 2016, PLoS genetics.

[27]  Chris T. A. Evelo,et al.  WikiPathways: building research communities on biological pathways , 2011, Nucleic Acids Res..

[28]  Charles E. Lawrence,et al.  Sequencing ultra-long DNA molecules with the Oxford Nanopore MinION , 2015, bioRxiv.

[29]  S. Burton,et al.  Metagenomics, gene discovery and the ideal biocatalyst. , 2004, Biochemical Society transactions.

[30]  Afiahayati,et al.  MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning , 2014, DNA research : an international journal for rapid publication of reports on genes and genomes.

[31]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Daniel D. Sommer,et al.  MetAMOS: a modular and open source metagenomic assembly and analysis pipeline , 2013, Genome Biology.

[33]  M. Borodovsky,et al.  Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. , 2008, Genome research.

[34]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[35]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[36]  Lincoln Stein,et al.  Using the Reactome Database , 2004, Current protocols in bioinformatics.

[37]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[38]  S. Salzberg,et al.  Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models , 2009, Nature Methods.

[39]  Katherine H. Huang,et al.  Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning , 2015, Nature Biotechnology.

[40]  Christina Backes,et al.  An integer linear programming approach for finding deregulated subgraphs in regulatory networks , 2011, Nucleic acids research.

[41]  Matthew Fraser,et al.  EBI metagenomics—a new resource for the analysis and archiving of metagenomic data , 2013, Nucleic Acids Res..

[42]  J. T. Dunnen,et al.  Next generation sequencing technology: Advances and applications. , 2014, Biochimica et biophysica acta.

[43]  Tatiana A. Tatusova,et al.  Gene: a gene-centered information resource at NCBI , 2014, Nucleic Acids Res..

[44]  Jonathan Dushoff,et al.  Unsupervised statistical clustering of environmental shotgun sequences , 2009, BMC Bioinformatics.

[45]  D. Antonopoulos,et al.  Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. , 2010, Cold Spring Harbor protocols.

[46]  Mick Watson,et al.  Successful test launch for nanopore sequencing , 2015, Nature Methods.

[47]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[48]  Jordan A. Fish,et al.  Xander: employing a novel method for efficient gene-targeted metagenomic assembly , 2015, Microbiome.

[49]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[50]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[51]  Veli Mäkinen,et al.  Normalized N50 assembly metric using gap-restricted co-linear chaining , 2022 .

[52]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[53]  C. Thermes,et al.  Ten years of next-generation sequencing technology. , 2014, Trends in genetics : TIG.

[54]  M. Borodovsky,et al.  Ab initio gene identification in metagenomic sequences , 2010, Nucleic acids research.

[55]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[56]  A. Krogh,et al.  A combined transmembrane topology and signal peptide prediction method. , 2004, Journal of molecular biology.

[57]  Mark J. Pallen,et al.  xBASE2: a comprehensive resource for comparative bacterial genomics , 2007, Nucleic Acids Res..

[58]  J. Handelsman,et al.  Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. , 1998, Chemistry & biology.

[59]  Alexander R. Pico,et al.  WikiPathways: Pathway Editing for the People , 2008, PLoS biology.

[60]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[61]  Mick Watson,et al.  The rumen microbial metagenome associated with high methane production in cattle , 2015, BMC Genomics.

[62]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[63]  Luis Pedro Coelho,et al.  Structure and function of the global ocean microbiome , 2015, Science.

[64]  Mick Watson,et al.  Meta4: a web application for sharing and annotating metagenomic gene predictions using web services , 2013, Front. Genet..

[65]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[66]  Alexey A. Gurevich,et al.  MetaQUAST: evaluation of metagenome assemblies , 2016, Bioinform..

[67]  Markus Krummenacker,et al.  The MetaCyc database of metabolic pathways and enzymes , 2017, Nucleic acids research.

[68]  S. Tringe,et al.  Metagenomic Discovery of Biomass-Degrading Genes and Genomes from Cow Rumen , 2011, Science.

[69]  Arend Hintze,et al.  Scaling metagenome sequence assembly with probabilistic de Bruijn graphs , 2011, Proceedings of the National Academy of Sciences.

[70]  Po-E Li,et al.  Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform , 2016, bioRxiv.

[71]  Steven Salzberg,et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer , 2007, Bioinform..

[72]  M. Watson,et al.  Illuminating the future of DNA sequencing , 2014, Genome Biology.

[73]  M. Pop,et al.  Sequence assembly demystified , 2013, Nature Reviews Genetics.

[74]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[75]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.