Ecology and molecular targets of hypermutation in the global microbiome

Changes in the sequence of an organism’s genome, i.e. mutations, are the raw material of evolution1. The frequency and location of mutations can be constrained by specific molecular mechanisms, such as Diversity-generating retroelements (DGRs)2–4. DGRs introduce mutations in specific target genes, and were characterized from several cultivated bacteria and bacteriophages2. Whilst a larger diversity of DGR loci has been identified in genomic data from environmental samples, i.e. metagenomes, the ecological role of these DGRs and their associated evolutionary drivers remain poorly understood5–7. Here we built and analyzed an extensive dataset of >30,000 metagenome-derived DGRs, and determine that DGRs have a single evolutionary origin and a universal bias towards adenine mutations. We further identified six major lineages of DGRs, each associated with a specific ecological niche defined as a genome type, i.e. whether the DGR is encoded on a viral or cellular genome, a limited set of taxa and environments, and a distinct type of target. Finally, we leverage read mapping and metagenomic time series to demonstrate that DGRs are consistently and broadly active, and responsible for >10% of all amino acid changes in some organisms at a conservative estimate. Overall, these results highlight the strong constraints under which DGRs diversify and expand, and elucidate several distinct roles these elements play in natural communities and in shaping microbial community structure and function in our environment.

[1]  Daniel J. Blankenberg,et al.  Community-led, integrated, reproducible multi-omics with anvi’o , 2020, Nature Microbiology.

[2]  I-Min A. Chen,et al.  IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses , 2020, Nucleic Acids Res..

[3]  Vincent J. Denef,et al.  A genomic catalog of Earth’s microbiomes , 2020, Nature Biotechnology.

[4]  D. Valentine,et al.  Role of diversity-generating retroelements for regulatory pathway tuning in cyanobacteria , 2020, BMC Genomics.

[5]  N. Kyrpides,et al.  CheckV: assessing the quality of metagenome-assembled viral genomes , 2020, bioRxiv.

[6]  Vito Adrian Cantu,et al.  PhANNs, a fast and accurate tool and web server to classify phage structural proteins , 2020, bioRxiv.

[7]  Vincent J. Denef,et al.  Giant virus diversity and host interactions through global metagenomics , 2020, Nature.

[8]  Donovan H Parks,et al.  GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database , 2019, Bioinform..

[9]  R. Sorek,et al.  The pan-immune system of bacteria: antiviral defence as a community resource , 2019, Nature Reviews Microbiology.

[10]  Peipei Xu,et al.  Nitrate-responsive OBP4-XTH9 regulatory module controls lateral root development in Arabidopsis thaliana , 2019, PLoS genetics.

[11]  T. Dagan,et al.  The Effect of Population Bottleneck Size and Selective Regime on Genetic Diversity and Evolvability in Bacteria , 2019, bioRxiv.

[12]  Natalia N. Ivanova,et al.  Cryptic inoviruses revealed as pervasive in bacteria and archaea across Earth’s biomes , 2019, Nature Microbiology.

[13]  Chaochun Wei,et al.  Discovery and characterization of the evolution, variation and functions of diversity-generating retroelements using thousands of genomes and metagenomes , 2019, BMC Genomics.

[14]  Evelien M. Adriaenssens,et al.  Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks , 2019, Nature Biotechnology.

[15]  Yuzhen Ye,et al.  MyDGR: a server for identification and characterization of diversity-generating retroelements , 2019, Nucleic Acids Res..

[16]  S. Rosenberg,et al.  What is mutation? A chapter in the series: How microbes “jeopardize” the modern synthesis , 2019, PLoS genetics.

[17]  P. Bork,et al.  Interactive Tree Of Life (iTOL) v4: recent updates and new developments , 2019, Nucleic Acids Res..

[18]  Milot Mirdita,et al.  HH-suite3 for fast remote homology detection and deep protein annotation , 2019, BMC Bioinformatics.

[19]  John N. Weinstein,et al.  ElemCor: accurate data analysis and enrichment calculation for high-resolution LC-MS stable isotope labeling experiments , 2019, BMC Bioinformatics.

[20]  R. Malmstrom,et al.  Optimizing de novo genome assembly from PCR-amplified metagenomes , 2018, PeerJ.

[21]  I-Min A. Chen,et al.  IMG/VR v.2.0: an integrated data management and analysis system for cultivated and environmental viral genomes , 2018, Nucleic Acids Res..

[22]  I-Min A. Chen,et al.  Genomes OnLine database (GOLD) v.7: updates and new features , 2018, Nucleic Acids Res..

[23]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[24]  I-Min A. Chen,et al.  IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes , 2018, Nucleic Acids Res..

[25]  F. Zhu,et al.  Genome-wide association study reveals novel loci associated with body size and carcass yields in Pekin ducks , 2019, BMC Genomics.

[26]  Liqing Zhang,et al.  DeepCapTail: A Deep Learning Framework to Predict Capsid and Tail Proteins of Phage Genomes , 2018, bioRxiv.

[27]  R. Edwards,et al.  A diversity-generating retroelement encoded by a globally ubiquitous Bacteroides phage , 2018, Microbiome.

[28]  S. Handa,et al.  Crystal structure of a Thermus aquaticus diversity-generating retroelement variable protein , 2018, bioRxiv.

[29]  C. Duarte,et al.  Sinking particles promote vertical connectivity in the ocean microbiome , 2018, Proceedings of the National Academy of Sciences.

[30]  Matthew Z. DeMaere,et al.  Genomic variation and biogeography of Antarctic haloarchaea , 2018, Microbiome.

[31]  Jeff F. Miller,et al.  Template-assisted synthesis of adenine-mutagenized cDNA by a retroelement protein complex , 2018, bioRxiv.

[32]  Cindy J. Castelle,et al.  Major New Microbial Groups Expand Diversity and Alter our Understanding of the Tree of Life , 2018, Cell.

[33]  M. Doebeli,et al.  Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem , 2018, Microbiome.

[34]  Jeff F. Miller,et al.  Diversity-generating retroelements: natural variation, classification and evolution inferred from a large-scale genomic survey , 2017, Nucleic acids research.

[35]  DGR mutagenic transposition occurs via hypermutagenic reverse transcription primed by nicked template RNA , 2017, Proceedings of the National Academy of Sciences.

[36]  Rob Egan,et al.  Ecogenomics of virophages and their giant virus hosts assessed through time series metagenomics , 2017, Nature Communications.

[37]  K. Hansen,et al.  Linear models enable powerful differential activity analysis in massively parallel reporter assays , 2017, BMC Genomics.

[38]  Arthur Brady,et al.  Strains, functions and dynamics in the expanded Human Microbiome Project , 2017, Nature.

[39]  Natalia N. Ivanova,et al.  Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data , 2017, Nature Protocols.

[40]  Natalia N. Ivanova,et al.  Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea , 2017, Nature Biotechnology.

[41]  Brian C. Thomas,et al.  Retroelement guided protein diversification abounds in vast lineages of bacteria and archaea , 2017, Nature Microbiology.

[42]  Henrik Nielsen,et al.  Predicting Secretory Proteins with SignalP. , 2017, Methods in molecular biology.

[43]  Peer Bork,et al.  Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses , 2016, Nature.

[44]  Geoffrey D. Hannigan,et al.  Evolutionary and functional implications of hypervariable loci within the skin virome , 2016, bioRxiv.

[45]  P. Mermelstein,et al.  Opposite Effects of mGluR1a and mGluR5 Activation on Nucleus Accumbens Medium Spiny Neuron Dendritic Spine Density , 2016, PloS one.

[46]  D. Valentine,et al.  Conservation of the C-type lectin fold for accommodating massive sequence variation in archaeal diversity-generating retroelements , 2016, BMC Structural Biology.

[47]  P. Straight,et al.  Bacterial Communities: Interactions to Scale , 2016, Front. Microbiol..

[48]  Danna R. Gifford,et al.  Divergent evolution peaks under intermediate population bottlenecks during bacterial experimental evolution , 2016, Proceedings of the Royal Society B: Biological Sciences.

[49]  A. Buckling,et al.  Evolutionary Ecology of Prokaryotic Immune Mechanisms , 2016, Microbiology and Molecular Reviews.

[50]  Heewook Lee,et al.  Genomic and Metagenomic Analysis of Diversity-Generating Retroelements Associated with Treponema denticola , 2016, Front. Microbiol..

[51]  Daniel H. Huson,et al.  Characterization of the Gut Microbial Community of Obese Patients Following a Weight-Loss Intervention Using Whole Metagenome Shotgun Sequencing , 2016, PloS one.

[52]  Matthew B. Sullivan,et al.  Illuminating structural proteins in viral “dark matter” with metaproteomics , 2016, Proceedings of the National Academy of Sciences.

[53]  Dominic Sauvageau,et al.  Host receptors for bacteriophage adsorption. , 2016, FEMS microbiology letters.

[54]  Edward A. Sausville,et al.  A direct interaction between NQO1 and a chemotherapeutic dimeric naphthoquinone , 2016, BMC Structural Biology.

[55]  Dongwan D. Kang,et al.  Genome-wide selective sweeps and gene-specific sweeps in natural bacterial populations , 2016, The ISME Journal.

[56]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[57]  F. Freimoser,et al.  Tritagonist as a new term for uncharacterised microorganisms in environmental systems , 2015, The ISME Journal.

[58]  Tom O. Delmont,et al.  Anvi’o: an advanced analysis and visualization platform for ‘omics data , 2015, PeerJ.

[59]  Dongwan D. Kang,et al.  MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities , 2015, PeerJ.

[60]  Connor T. Skennerton,et al.  CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes , 2015, Genome research.

[61]  Matthew B. Sullivan,et al.  VirSorter: mining viral signal from microbial genomic data , 2015, PeerJ.

[62]  V. Tremaroli,et al.  Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life. , 2015, Cell host & microbe.

[63]  D. Valentine,et al.  Targeted diversity generation by intraterrestrial archaea and archaeal viruses , 2015, Nature Communications.

[64]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[65]  Jeff F. Miller,et al.  Diversity-generating Retroelements in Phage and Bacterial Genomes , 2014, Microbiology spectrum.

[66]  Yuzhen Ye Identification of Diversity-Generating Retroelements in Human Microbiomes , 2014, International journal of molecular sciences.

[67]  C. Ané,et al.  A linear-time algorithm for Gaussian and non-Gaussian trait evolution models. , 2014, Systematic biology.

[68]  Brian Bushnell,et al.  BBMap: A Fast, Accurate, Splice-Aware Aligner , 2014 .

[69]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[70]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[71]  Jeff F. Miller,et al.  Surface display of a massively variable lipoprotein by a Legionella diversity-generating retroelement , 2013, Proceedings of the National Academy of Sciences.

[72]  D. Söll,et al.  UGA is an additional glycine codon in uncultured SR1 bacteria from the human microbiota , 2013, Proceedings of the National Academy of Sciences.

[73]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[74]  Alison S. Waller,et al.  Genomic variation landscape of the human gut microbiome , 2012, Nature.

[75]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[76]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[77]  Liam J. Revell,et al.  phytools: an R package for phylogenetic comparative biology (and other things) , 2012 .

[78]  Frederic D Bushman,et al.  Hypervariable loci in the human gut virome , 2012, Proceedings of the National Academy of Sciences.

[79]  Jeff F. Miller,et al.  Target Site Recognition by a Diversity-Generating Retroelement , 2011, PLoS genetics.

[80]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[81]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[82]  P. Ghosh,et al.  Conservation of the C-type lectin fold for massive sequence variation in a Treponema diversity-generating retroelement , 2011, Proceedings of the National Academy of Sciences.

[83]  Jonathan Dworkin,et al.  Eukaryote-Like Serine/Threonine Kinases and Phosphatases in Bacteria , 2011, Microbiology and Molecular Reviews.

[84]  R. Knight,et al.  UniFrac: an effective distance metric for microbial community comparison , 2011, The ISME Journal.

[85]  Mitchell J. Sullivan,et al.  Easyfig: a genome comparison visualizer , 2011, Bioinform..

[86]  Sylvain Moineau,et al.  Bacteriophage resistance mechanisms , 2010, Nature Reviews Microbiology.

[87]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[88]  T. Garland,et al.  Phylogenetic logistic regression for binary dependent variables. , 2010, Systematic biology.

[89]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[90]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[91]  Matthew Z. DeMaere,et al.  The genomic basis of trophic strategy in marine bacteria , 2009, Proceedings of the National Academy of Sciences.

[92]  Toni Gabaldón,et al.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses , 2009, Bioinform..

[93]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[94]  A C C Gibbs,et al.  Data Analysis , 2009, Encyclopedia of Database Systems.

[95]  S. Zimmerly,et al.  A diversity of uncharacterized reverse transcriptases in bacteria , 2008, Nucleic acids research.

[96]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[97]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[98]  Nikos Kyrpides,et al.  CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats , 2007, BMC Bioinformatics.

[99]  Robert C. Edgar,et al.  PILER-CR: Fast and accurate identification of CRISPR repeats , 2007, BMC Bioinformatics.

[100]  Zhou Yu,et al.  Ig-like domains on bacteriophages: a tale of promiscuity and deceit. , 2006, Journal of molecular biology.

[101]  Andrej Sali,et al.  The C-type lectin fold as an evolutionary solution for massive sequence variation , 2005, Nature Structural &Molecular Biology.

[102]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[103]  R. Simons,et al.  Tropism switching in Bordetella bacteriophage defines a family of diversity-generating retroelements , 2004, Nature.

[104]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[105]  G. Tang,et al.  Indian Hedgehog: A Mechanotransduction Mediator in Condylar Cartilage , 2004, Journal of dental research.

[106]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[107]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[108]  R. Simons,et al.  Reverse Transcriptase-Mediated Tropism Switching in Bordetella Bacteriophage , 2002, Science.

[109]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.