mMGE: a database for human metagenomic extrachromosomal mobile genetic elements

Abstract Extrachromosomal mobile genetic elements (eMGEs), including phages and plasmids, that can move across different microbes, play important roles in genome evolution and shaping the structure of microbial communities. However, we still know very little about eMGEs, especially their abundances, distributions and putative functions in microbiomes. Thus, a comprehensive description of eMGEs is of great utility. Here we present mMGE, a comprehensive catalog of 517 251 non-redundant eMGEs, including 92 492 plasmids and 424 759 phages, derived from diverse body sites of 66 425 human metagenomic samples. About half the eMGEs could be further grouped into 70 074 clusters using relaxed criteria (referred as to eMGE clusters below). We provide extensive annotations of the identified eMGEs including sequence characteristics, taxonomy affiliation, gene contents and their prokaryotic hosts. We also calculate the prevalence, both within and across samples for each eMGE and eMGE cluster, enabling users to see putative associations of eMGEs with human phenotypes or their distribution preferences. All eMGE records can be browsed or queried in multiple ways, such as eMGE clusters, metagenomic samples and associated hosts. The mMGE is equipped with a user-friendly interface and a BLAST server, facilitating easy access/queries to all its contents easily. mMGE is freely available for academic use at: https://mgedb.comp-sysbio.org.

[1]  Christopher M Thomas,et al.  Mechanisms of, and Barriers to, Horizontal Gene Transfer between Bacteria , 2005, Nature Reviews Microbiology.

[2]  Evelien M. Adriaenssens,et al.  Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks , 2019, Nature Biotechnology.

[3]  Yanhui Hu,et al.  PlasmID: a centralized repository for plasmid clone information and distribution , 2006, Nucleic Acids Res..

[4]  M. Sullivan,et al.  The human gut virome database , 2019, bioRxiv.

[5]  Natalia N. Ivanova,et al.  Minimum Information about an Uncultivated Virus Genome (MIUViG) , 2018, Nature Biotechnology.

[6]  Luis Pedro Coelho,et al.  Functional implications of microbial and viral gut metagenome changes in early stage L-DOPA-naïve Parkinson’s disease patients , 2017, Genome Medicine.

[7]  G. Bugla-Płoskońska,et al.  Virulence factors, prevalence and potential transmission of extraintestinal pathogenic Escherichia coli isolated from different sources: recent reports , 2019, Gut Pathogens.

[8]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[9]  Arthur Brady,et al.  Strains, functions and dynamics in the expanded Human Microbiome Project , 2017, Nature.

[10]  Peer Bork,et al.  Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses , 2016, Nature.

[11]  Baris E. Suzek,et al.  The Universal Protein Resource (UniProt) in 2010 , 2009, Nucleic Acids Res..

[12]  M. Brockhurst,et al.  Temperate phages both mediate and drive adaptive evolution in pathogen biofilms , 2016, Proceedings of the National Academy of Sciences.

[13]  João A. Carriço,et al.  Plasmid ATLAS: plasmid visual analytics and identification in high-throughput sequencing data , 2018, Nucleic Acids Res..

[14]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[15]  Matthew B. Sullivan,et al.  VirSorter: mining viral signal from microbial genomic data , 2015, PeerJ.

[16]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[17]  Laura S. Frost,et al.  Mobile genetic elements: the agents of open source evolution , 2005, Nature Reviews Microbiology.

[18]  J. Balcázar,et al.  Bacteriophages as Environmental Reservoirs of Antibiotic Resistance. , 2019, Trends in microbiology.

[19]  Jia Gu,et al.  fastp: an ultra-fast all-in-one FASTQ preprocessor , 2018, bioRxiv.

[20]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[21]  B. Hurwitz,et al.  Viral metabolic reprogramming in marine ecosystems. , 2016, Current opinion in microbiology.

[22]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[23]  B. Raymond,et al.  Lineage‐specific plasmid acquisition and the evolution of specialized pathogens in Bacillus thuringiensis and the Bacillus cereus group , 2018, Molecular ecology.

[24]  Hing-Fung Ting,et al.  MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. , 2016, Methods.

[25]  Karthik Anantharaman,et al.  VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences , 2020, Microbiome.

[26]  James Robertson,et al.  MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies , 2018, Microbial genomics.

[27]  Christina Backes,et al.  PLSDB: a resource of complete bacterial plasmids , 2018, Nucleic Acids Res..

[28]  Jenny Sauk,et al.  Disease-Specific Alterations in the Enteric Virome in Inflammatory Bowel Disease , 2015, Cell.

[29]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[30]  Nikos Kyrpides,et al.  CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats , 2007, BMC Bioinformatics.

[31]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[32]  I. Mizrahi,et al.  Emergence of plasmid stability under non-selective conditions maintains antibiotic resistance , 2019, Nature Communications.

[33]  Jie Tan,et al.  PPR-Meta: a tool for identifying phages and plasmids from metagenomic fragments using deep learning , 2019, GigaScience.

[34]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[35]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[36]  A. Vardi,et al.  Virocell Metabolism: Metabolic Innovations During Host-Virus Interactions in the Ocean. , 2016, Trends in microbiology.

[37]  I-Min A. Chen,et al.  IMG/VR v.2.0: an integrated data management and analysis system for cultivated and environmental viral genomes , 2018, Nucleic Acids Res..

[38]  T. Sutton,et al.  The Human Gut Virome Is Highly Diverse, Stable, and Individual Specific. , 2019, Cell host & microbe.

[39]  Nikos Kyrpides,et al.  The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification , 2014, Nucleic Acids Res..

[40]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[41]  Hannah R. Meredith,et al.  Persistence and reversal of plasmid-mediated antibiotic resistance , 2017, Nature Communications.

[42]  Patricia P. Chan,et al.  tRNAscan-SE: Searching for tRNA Genes in Genomic Sequences. , 2019, Methods in molecular biology.

[43]  Emiley A. Eloe-Fadrosh,et al.  Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity , 2017, PeerJ.

[44]  Natalia N. Ivanova,et al.  Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data , 2017, Nature Protocols.

[45]  Davide Heller,et al.  eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses , 2018, Nucleic Acids Res..

[46]  R. Sitaraman Prokaryotic horizontal gene transfer within the human holobiont: ecological-evolutionary inferences, implications and possibilities , 2018, Microbiome.

[47]  Yang Young Lu,et al.  VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data , 2017, Microbiome.

[48]  Johannes Söding,et al.  Clustering huge protein sequence sets in linear time , 2017, Nature Communications.

[49]  H. Hasman,et al.  PlasmidFinder and In Silico pMLST: Identification and Typing of Plasmid Replicons in Whole-Genome Sequencing (WGS). , 2020, Methods in molecular biology.

[50]  A. Fetsch,et al.  Acquisition of virulence factors in livestock-associated MRSA: Lysogenic conversion of CC398 strains by virulence gene-containing phages , 2017, Scientific Reports.

[51]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[52]  N. Kyrpides,et al.  CheckV: assessing the quality of metagenome-assembled viral genomes , 2020, bioRxiv.

[53]  Tanja Woyke,et al.  Viral dark matter and virus–host interactions resolved from publicly available microbial genomes , 2015, eLife.

[54]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[55]  Gipsi Lima-Mendez,et al.  ACLAME: A CLAssification of Mobile genetic Elements, update 2010 , 2009, Nucleic Acids Res..

[56]  Randy R Richter,et al.  Using MeSH (Medical Subject Headings) to Enhance PubMed Search Strategies for Evidence-Based Practice in Physical Therapy , 2011, Physical Therapy.

[57]  S. Handley,et al.  Phages and Human Health: More Than Idle Hitchhikers , 2019, Viruses.

[58]  F. Bushman,et al.  Detecting contamination in viromes using ViromeQC , 2019, Nature Biotechnology.

[59]  P. Pevzner,et al.  Plasmid detection and assembly in genomic and metagenomic data sets , 2019, Genome research.

[60]  Eugene V. Koonin,et al.  Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotation , 2016, Nucleic Acids Res..

[61]  Xing-Ming Zhao,et al.  MVP: a microbe–phage interaction database , 2017, Nucleic Acids Res..

[62]  Jun Yu,et al.  Alterations in Enteric Virome Are Associated With Colorectal Cancer and Survival Outcomes. , 2018, Gastroenterology.

[63]  M. Hattori,et al.  Long-read metagenomic exploration of extrachromosomal mobile genetic elements in the human gut , 2019, Microbiome.

[64]  Dongwan D. Kang,et al.  MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities , 2015, PeerJ.