An Integrated Pipeline for Annotation and Visualization of Metagenomic Contigs

Here, we describe MetaErg, a standalone and fully automated metagenome and metaproteome annotation pipeline. Annotation of metagenomes is challenging. First, metagenomes contain sequence data of many organisms from all domains of life. Second, many of these are from understudied lineages, encoding genes with low similarity to experimentally validated reference genes. Third, assembly and binning are not perfect, sometimes resulting in artifactual hybrid contigs or genomes. To address these challenges, MetaErg provides graphical summaries of annotation outcomes, both for the complete metagenome and for individual metagenome-assembled genomes (MAGs). It performs a comprehensive annotation of each gene, including taxonomic classification, enabling functional inferences despite low similarity to reference genes, as well as detection of potential assembly or binning artifacts. When provided with metaproteome information, it visualizes gene and pathway activity using sequencing coverage and proteomic spectral counts, respectively. For visualization, MetaErg provides an HTML interface, bringing all annotation results together, and producing sortable and searchable tables, collapsible trees, and other graphic representations enabling intuitive navigation of complex data. MetaErg, implemented in Perl, HTML, and JavaScript, is a fully open source application, distributed under Academic Free License at https://github.com/xiaoli-dong/metaerg. MetaErg is also available as a docker image at https://hub.docker.com/r/xiaolidong/docker-metaerg.

[1]  Peter D. Karp,et al.  The MetaCyc Database , 2002, Nucleic Acids Res..

[2]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[3]  Erin Beck,et al.  TIGRFAMs and Genome Properties in 2013 , 2012, Nucleic Acids Res..

[4]  Yuzhen Ye,et al.  A Parsimony Approach to Biological Pathway Reconstruction/Inference for Genomes and Metagenomes , 2009, PLoS Comput. Biol..

[5]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[6]  J. Derisi,et al.  Profile Hidden Markov Models for the Detection of Viruses within Metagenomic Sequence Data , 2014, PloS one.

[7]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[8]  Brian Bushnell,et al.  BBMap: A Fast, Accurate, Splice-Aware Aligner , 2014 .

[9]  A. Krogh,et al.  A combined transmembrane topology and signal peptide prediction method. , 2004, Journal of molecular biology.

[10]  Po-E Li,et al.  Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform , 2016, bioRxiv.

[11]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..

[12]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[13]  Yasukazu Nakamura,et al.  DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication , 2017, Bioinform..

[14]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[15]  Ronald J Moore,et al.  The past, present and future of microbiome analyses , 2016, Nature Protocols.

[16]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[17]  Brian C. Thomas,et al.  Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system , 2016, Nature Communications.

[18]  Sean R. Eddy,et al.  nhmmer: DNA homology search with profile HMMs , 2013, Bioinform..

[19]  I-Min A. Chen,et al.  IMG/M: integrated genome and metagenome comparative data analysis system , 2016, Nucleic Acids Res..

[20]  Susannah G. Tringe,et al.  FOAM (Functional Ontology Assignments for Metagenomes): a Hidden Markov Model (HMM) database with environmental focus , 2014, Nucleic acids research.

[21]  Alexandre Renaux,et al.  MicroScope in 2017: an expanding and evolving integrated resource for community expertise of microbial genomes , 2016, Nucleic Acids Res..

[22]  P. Pevzner,et al.  metaSPAdes: a new versatile metagenomic assembler. , 2017, Genome research.

[23]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[24]  Alla Lapidus,et al.  A Bioinformatician's Guide to Metagenomics , 2008, Microbiology and Molecular Biology Reviews.

[25]  M. Strous,et al.  Assessing species biomass contributions in microbial communities via metaproteomics , 2017, Nature Communications.

[26]  Dongwan D. Kang,et al.  MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities , 2015, PeerJ.

[27]  Robert D. Finn,et al.  EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies , 2017, Nucleic Acids Res..

[28]  Elizabeth M Glass,et al.  MG-RAST, a Metagenomics Service for Analysis of Microbial Community Structure and Function. , 2016, Methods in molecular biology.

[29]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[30]  Dean Laslett,et al.  ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. , 2004, Nucleic acids research.

[31]  Eric P. Nawrocki,et al.  NCBI prokaryotic genome annotation pipeline , 2016, Nucleic acids research.

[32]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[33]  Donovan H. Parks,et al.  A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life , 2018, Nature Biotechnology.

[34]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[35]  Kelly V. Ruggles,et al.  WHAM!: a web-based visualization suite for user-defined analysis of metagenomic shotgun sequencing data , 2018, BMC Genomics.

[36]  Jennifer A. Doudna,et al.  New CRISPR-Cas systems from uncultivated microbes , 2016, Nature.

[37]  Konstantinos D. Tsirigos,et al.  SignalP 5.0 improves signal peptide predictions using deep neural networks , 2019, Nature Biotechnology.

[38]  Tatiana A. Tatusova,et al.  The National Center for Biotechnology Information's Protein Clusters Database , 2008, Nucleic Acids Res..

[39]  Robert D. Finn,et al.  Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families , 2017, Nucleic Acids Res..