Automated analysis of genomic sequences facilitates high-throughput and comprehensive description of bacteria

The study of microbial communities is hampered by the large fraction of still unknown bacteria. However, many of these species have been isolated, yet lack a validly published name or description. The validation of names for novel bacteria requires that the uniqueness of those taxa is demonstrated and their properties are described. The accepted format for this is the protologue, which can be time-consuming to create. Hence, many research fields in microbiology and biotechnology will greatly benefit from new approaches that reduce the workload and harmonise the generation of protologues.We have developed Protologger, a bioinformatic tool that automatically generates all the necessary readouts for writing a detailed protologue. By producing multiple taxonomic outputs, functional features and ecological analysis using the 16S rRNA gene and genome sequences from a single species, the time needed to gather the information for describing novel taxa is substantially reduced. The usefulness of Protologger was demonstrated by using three published isolate collections to describe 34 novel taxa, encompassing 17 novel species and 17 novel genera, including the automatic generation of ecologically and functionally relevant names. We also highlight the need to utilise multiple taxonomic delineation methods, as while inconsistencies between each method occur, a combined approach provides robust placement. Protologger is open source; all scripts and datasets are available, along with a webserver at www.protologger.de

[1]  K. Konstantinidis,et al.  Genomic insights that advance the species definition for prokaryotes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[3]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[4]  Donovan H. Parks,et al.  Roadmap for naming uncultivated Archaea and Bacteria , 2020, Nature Microbiology.

[5]  Mitra S. Ganewatta,et al.  Macromolecular-clustered facial amphiphilic antimicrobials , 2018, Nature Communications.

[6]  Donovan H. Parks,et al.  Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life , 2017, Nature Microbiology.

[7]  Changsheng Li,et al.  Genome-centric view of carbon processing in thawing permafrost , 2018, Nature.

[8]  A. Oren Prokaryotic names: the bold and the beautiful. , 2020, FEMS Microbiology Letters.

[9]  J. Veening,et al.  Time-resolved dual RNA-seq reveals extensive rewiring of lung epithelial and pneumococcal transcriptomes during early infection , 2016, Genome Biology.

[10]  Mingchao Yu,et al.  Proliferation of hydrocarbon-degrading microbes at the bottom of the Mariana Trench , 2019, Microbiome.

[11]  Z. Ren,et al.  Characterizing the population structure and genetic diversity of maize breeding germplasm in Southwest China using genome-wide SNP markers , 2016, BMC Genomics.

[12]  M. Dunn,et al.  A human gut bacterial genome and culture collection for improved metagenomic analyses , 2019, Nature Biotechnology.

[13]  A. Santoro,et al.  Heterotrophic Thaumarchaea with Small Genomes Are Widespread in the Dark Ocean , 2020, mSystems.

[14]  M. Pallen,et al.  The Next Million Names for Archaea and Bacteria. , 2020, Trends in microbiology.

[15]  William C. Bennett,et al.  Author Correction: Imaging and clinical data archive for head and neck squamous cell carcinoma patients treated with radiotherapy , 2018, Scientific Data.

[16]  David Smith,et al.  New ECCO model documents for Material Deposit and Transfer Agreements in compliance with the Nagoya Protocol , 2020, FEMS microbiology letters.

[17]  B. Stecher,et al.  Sequence and cultivation study of Muribaculaceae reveals novel species, host preference, and functional potential of this yet undescribed family , 2019, Microbiome.

[18]  Alice C. McHardy,et al.  Functional overlap of the Arabidopsis leaf and root microbiota , 2015, Nature.

[19]  Brian C. Thomas,et al.  Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system , 2016, Nature Communications.

[20]  A. Phillippy,et al.  High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries , 2017, Nature Communications.

[21]  Robert D. Finn,et al.  A new genomic blueprint of the human gut microbiota , 2019, Nature.

[22]  M. Horn,et al.  IMNGS: A comprehensive open resource of processed 16S rRNA microbial profiles for ecology and diversity studies , 2016, Scientific Reports.

[23]  Thijs J. G. Ettema,et al.  Complex archaea that bridge the gap between prokaryotes and eukaryotes , 2015, Nature.

[24]  Kai Wang,et al.  The Mouse Gut Microbial Biobank expands the coverage of cultured bacteria , 2020, Nature Communications.

[25]  Andrew C. Pawlowski,et al.  The Comprehensive Antibiotic Resistance Database , 2013, Antimicrobial Agents and Chemotherapy.

[26]  Hans-Peter Klenk,et al.  Digital DNA-DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison , 2010, Standards in genomic sciences.

[27]  K. Nealson,et al.  A Genus Definition for Bacteria and Archaea Based on a Standard Genome Relatedness Index , 2020, mBio.

[28]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[29]  A High-Fat/High-Protein, Atkins-Type Diet Exacerbates Clostridioides (Clostridium) difficile Infection in Mice, whereas a High-Carbohydrate Diet Protects , 2020, mSystems.

[30]  J. Lalucat,et al.  Genomics in Bacterial Taxonomy: Impact on the Genus Pseudomonas , 2020, Genes.

[31]  Avinash C. Pandey,et al.  Nitric Oxide Ameliorates Zinc Oxide Nanoparticles Phytotoxicity in Wheat Seedlings: Implication of the Ascorbate–Glutathione Cycle , 2017, Front. Plant Sci..

[32]  Paolo Manghi,et al.  Microbial genomes from non-human primate gut metagenomes expand the primate-associated bacterial tree of life with over 1000 novel species , 2019, Genome Biology.

[33]  Konstantinos T. Konstantinidis,et al.  Towards a Genome-Based Taxonomy for Prokaryotes , 2005, Journal of bacteriology.

[34]  Jan P. Meier-Kolthoff,et al.  The Mouse Intestinal Bacterial Collection (miBC) provides host-specific insight into cultured diversity and functional potential of the gut microbiota , 2016, Nature Microbiology.

[35]  James R. Cole,et al.  The Microbial Genomes Atlas (MiGA) webserver: taxonomic and gene diversity analysis of Archaea and Bacteria at the whole genome level , 2018, Nucleic Acids Res..

[36]  Pedro M. Coutinho,et al.  The carbohydrate-active enzymes database (CAZy) in 2013 , 2013, Nucleic Acids Res..

[37]  M. Gadanho,et al.  Photoprotective Bioactivity Present in a Unique Marine Bacteria Collection from Portuguese Deep Sea Hydrothermal Vents , 2013, Marine drugs.

[38]  Alexander F. Auch,et al.  Genome sequence-based species delimitation with confidence intervals and improved distance functions , 2013, BMC Bioinformatics.

[39]  Natalia N. Ivanova,et al.  Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea , 2017, Nature Biotechnology.

[40]  Donovan H Parks,et al.  GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database , 2019, Bioinform..

[41]  B. Birren,et al.  The “Most Wanted” Taxa from the Human Microbiome for Whole Genome Sequencing , 2012, PloS one.

[42]  Weiyun Zhu,et al.  Ruminal microbiome-host crosstalk stimulates the development of the ruminal epithelium in a lamb model , 2019, Microbiome.

[43]  Samineh Mesbah,et al.  Correction: Novel stochastic framework for automatic segmentation of human thigh MRI volumes and its applications in spinal cord injured individuals , 2019, PloS one.

[44]  Daniel H. Huson,et al.  Whole-genome prokaryotic phylogeny , 2005, Bioinform..

[45]  K. Schleifer,et al.  Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences , 2014, Nature Reviews Microbiology.

[46]  Rob Knight,et al.  UCHIME improves sensitivity and speed of chimera detection , 2011, Bioinform..

[47]  N. Segata,et al.  A collection of bacterial isolates from the pig intestine reveals functional and taxonomic diversity , 2020, Nature Communications.

[48]  Vincent J. Denef,et al.  A genomic catalog of Earth’s microbiomes , 2020, Nature Biotechnology.

[49]  P. Arruda,et al.  A Community-Based Culture Collection for Targeting Novel Plant Growth-Promoting Bacteria from the Sugarcane Microbiome , 2018, Front. Plant Sci..

[50]  B. Baker,et al.  Expansive microbial metabolic versatility and biodiversity in dynamic Guaymas Basin hydrothermal sediments , 2018, Nature Communications.

[51]  C. Woese,et al.  Phylogenetic structure of the prokaryotic domain: The primary kingdoms , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Sean M. Kearney,et al.  A library of human gut bacterial isolates paired with longitudinal multiomics data enables mechanistic microbiome research , 2019, Nature Medicine.

[53]  P. de Vos,et al.  Proposed minimal standards for describing new taxa of aerobic, endospore-forming bacteria. , 2009, International journal of systematic and evolutionary microbiology.

[54]  Natalia N. Ivanova,et al.  Genome-Based Taxonomic Classification of Bacteroidetes , 2016, Front. Microbiol..

[55]  George M Garrity,et al.  International Code of Nomenclature of Prokaryotes. , 2015, International journal of systematic and evolutionary microbiology.

[56]  A. Parte LPSN - List of Prokaryotic names with Standing in Nomenclature (bacterio.net), 20 years on. , 2018, International journal of systematic and evolutionary microbiology.

[57]  Jan P. Meier-Kolthoff,et al.  Correction: Corrigendum: The Mouse Intestinal Bacterial Collection (miBC) provides host-specific insight into cultured diversity and functional potential of the gut microbiota , 2016, Nature Microbiology.

[58]  Jan P. Meier-Kolthoff,et al.  TYGS is an automated high-throughput platform for state-of-the-art genome-based taxonomy , 2019, Nature Communications.

[59]  Brian C. Thomas,et al.  Novel soil bacteria possess diverse genes for secondary metabolite biosynthesis , 2018, Nature.

[60]  Robert D. Finn,et al.  A unified catalog of 204,938 reference genomes from the human gut microbiome , 2020, Nature Biotechnology.

[61]  D. Gevers,et al.  Towards a prokaryotic genomic taxonomy. , 2005, FEMS microbiology reviews.

[62]  Ariel Orellana,et al.  Correction: Corrigendum: ER-localized auxin transporter PIN8 regulates auxin homoeostasis and male gametophyte development in Arabidopsis , 2013, Nature Communications.

[63]  R. Amann,et al.  The species concept for prokaryotes. , 2013, FEMS microbiology reviews.

[64]  F. Thompson,et al.  Towards a genome based taxonomy of Mycoplasmas. , 2011, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[65]  Donovan H. Parks,et al.  A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life , 2018, Nature Biotechnology.

[66]  Elaina D. Graham,et al.  The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans , 2017, Scientific Data.

[67]  G. Haran,et al.  Vacuum Rabi splitting in a plasmonic cavity at the single quantum emitter limit , 2015, Nature Communications.

[68]  J. Chun,et al.  EzTaxon: a web-based tool for the identification of prokaryotes based on 16S ribosomal RNA gene sequences. , 2007, International journal of systematic and evolutionary microbiology.

[69]  A. Goesmann,et al.  Complete genome of a new Firmicutes species belonging to the dominant human colonic microbiota ('Ruminococcus bicirculans') reveals two chromosomes and a selective capacity to utilize plant glucans. , 2014, Environmental microbiology.

[70]  G. Wong,et al.  Host immunoglobulin G selectively identifies pathobionts in pediatric inflammatory bowel diseases , 2019, Microbiome.

[71]  A. van der Ende,et al.  Genus delineation of Chlamydiales by analysis of the percentage of conserved proteins justifies the reunifying of the genera Chlamydia and Chlamydophila into one single genus Chlamydia. , 2016, Pathogens and disease.

[72]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[73]  Chun-Fu Chang,et al.  Fe3O4 thin films: controlling and manipulating an elusive quantum material , 2016, npj Quantum Materials.

[74]  Edoardo Pasolli,et al.  Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle , 2019, Cell.

[75]  C. Huttenhower,et al.  PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes , 2013, Nature Communications.

[76]  Vincent Lombard,et al.  Cultivation and sequencing of rumen microbiome members from the Hungate1000 Collection , 2018, Nature Biotechnology.

[77]  Henrik Christensen,et al.  Proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. , 2018, International journal of systematic and evolutionary microbiology.

[78]  A. Mchardy,et al.  An Integrated Metagenome Catalog Reveals New Insights into the Murine Gut Microbiome , 2020, Cell reports.

[79]  Jizhong Zhou,et al.  A Proposed Genus Boundary for the Prokaryotes Based on Genomic Insights , 2014, Journal of bacteriology.

[80]  C. Franz,et al.  Recommended minimal standards for description of new taxa of the genera Bifidobacterium, Lactobacillus and related genera. , 2014, International journal of systematic and evolutionary microbiology.

[81]  Hans-Peter Klenk,et al.  Taxonomic use of DNA G+C content and DNA-DNA hybridization in the genomic age. , 2014, International journal of systematic and evolutionary microbiology.

[82]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[83]  Qichao Tu,et al.  Gene content dissimilarity for subclassification of highly similar microbial strains , 2016, BMC Genomics.

[84]  Brian D. Ondov,et al.  Mash: fast genome and metagenome distance estimation using MinHash , 2015, Genome Biology.

[85]  D. Raoult,et al.  Extensive culturomics of 8 healthy samples enhances metagenomics efficiency , 2019, PloS one.

[86]  R. Dewhurst,et al.  Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen , 2018, Nature Communications.

[87]  Pelin Yilmaz,et al.  The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks , 2013, Nucleic Acids Res..

[88]  Adam P. Arkin,et al.  FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix , 2009, Molecular biology and evolution.