Pangenomic Definition of Prokaryotic Species and the Phylogenetic Structure of Prochlorococcus spp.

The pangenome is the collection of all groups of orthologous genes (OGGs) from a set of genomes. We apply the pangenome analysis to propose a definition of prokaryotic species based on identification of lineage-specific gene sets. While being similar to the classical biological definition based on allele flow, it does not rely on DNA similarity levels and does not require analysis of homologous recombination. Hence this definition is relatively objective and independent of arbitrary thresholds. A systematic analysis of 110 accepted species with the largest numbers of sequenced strains yields results largely consistent with the existing nomenclature. However, it has revealed that abundant marine cyanobacteria Prochlorococcus marinus should be divided into two species. As a control we have confirmed the paraphyletic origin of Yersinia pseudotuberculosis (with embedded, monophyletic Y. pestis) and Burkholderia pseudomallei (with B. mallei). We also demonstrate that by our definition and in accordance with recent studies Escherichia coli and Shigella spp. are one species.

[1]  J. Felsenstein Maximum-likelihood estimation of evolutionary trees from continuous characters. , 1973, American journal of human genetics.

[2]  T. Macke,et al.  A phylogenetic definition of the major eubacterial taxa. , 1985, Systematic and applied microbiology.

[3]  C. Woese,et al.  Bacterial evolution , 1987, Microbiological reviews.

[4]  Lawrence G. Wayne,et al.  International Committee on Systematic Bacteriology: Announcement of the Report of the Ad Hoc Committee on Reconciliation of Approaches to Bacterial Systematics , 1988 .

[5]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[6]  K. Bernard,et al.  Clinical microbiology of coryneform bacteria , 1997, Clinical microbiology reviews.

[7]  S. Dongen A cluster algorithm for graphs , 2000 .

[8]  Gary J. Olsen,et al.  Aminoacyl-tRNA Synthetases, the Genetic Code, and the Evolutionary Process , 2000, Microbiology and Molecular Biology Reviews.

[9]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[10]  F. Blattner,et al.  Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[11]  W. Whitman,et al.  Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology. , 2002, International journal of systematic and evolutionary microbiology.

[12]  F. Cohan What are bacterial species? , 2002, Annual review of microbiology.

[13]  Sallie W. Chisholm,et al.  Resolution of Prochlorococcus and Synechococcus Ecotypes by Using 16S-23S Ribosomal DNA Internal Transcribed Spacer Sequences , 2002, Applied and Environmental Microbiology.

[14]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[15]  D. J. Funk,et al.  Species-Level Paraphyly and Polyphyly: Frequency, Causes, and Consequences, with Insights from Animal Mitochondrial DNA , 2003 .

[16]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[17]  J. Clarridge,et al.  Impact of 16S rRNA Gene Sequence Analysis for Identification of Bacteria on Clinical Microbiology and Infectious Diseases , 2004, Clinical Microbiology Reviews.

[18]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[19]  K. Konstantinidis,et al.  Genomic insights that advance the species definition for prokaryotes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Yuichiro Hara,et al.  Sister group relationship of turtles to the bird-crocodilian clade revealed by nuclear DNA-coded proteins. , 2005, Molecular biology and evolution.

[22]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[23]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[24]  Feng Chen,et al.  Patterns and Implications of Gene Gain and Loss in the Evolution of Prochlorococcus , 2007, PLoS genetics.

[25]  Giorgio Valle,et al.  A global gene evolution analysis on Vibrionaceae family using phylogenetic profile , 2007, BMC Bioinformatics.

[26]  W. Doolittle,et al.  On the origin of prokaryotic species. , 2009, Genome research.

[27]  Pascal Lapierre,et al.  Estimating the size of the bacterial pan-genome. , 2009, Trends in genetics : TIG.

[28]  Sonja J. Prohaska,et al.  Proteinortho: Detection of (Co-)orthologs in large-scale analysis , 2011, BMC Bioinformatics.

[29]  T. Ficht Brucella taxonomy and evolution. , 2010, Future microbiology.

[30]  Inna Dubchak,et al.  MicrobesOnline: an integrated portal for comparative and functional genomics , 2009, Nucleic Acids Res..

[31]  D. Ussery,et al.  Standard operating procedure for computing pangenome trees , 2010, Standards in genomic sciences.

[32]  F. Thompson,et al.  Towards a genome based taxonomy of Mycoplasmas. , 2011, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[33]  Paul Keim,et al.  Diversity of 16S-23S rDNA Internal Transcribed Spacer (ITS) Reveals Phylogenetic Relationships in Burkholderia pseudomallei and Its Near-Neighbors , 2011, PloS one.

[34]  A. Goesmann,et al.  Unique core genomes of the bacterial family vibrionaceae: insights into niche adaptation and speciation , 2012, BMC Genomics.

[35]  P. Higgs,et al.  Testing the infinitely many genes model for the evolution of the bacterial core genome and pangenome. , 2012, Molecular biology and evolution.

[36]  Wolfgang R. Hess,et al.  The Infinitely Many Genes Model for the Distributed Genome of Bacteria , 2012, Genome biology and evolution.

[37]  G. Garrity Bergey’s Manual® of Systematic Bacteriology , 2012, Springer New York.

[38]  Songnian Hu,et al.  EvolView, an online tool for visualizing, annotating and managing phylogenetic trees , 2012, Nucleic Acids Res..

[39]  Zhijie Jiang,et al.  Comparative analysis of genome sequences from four strains of the Buchnera aphidicola Mp endosymbion of the green peach aphid, Myzus persicae , 2013, BMC Genomics.

[40]  M. Gelfand,et al.  Evolution of Pan-Genomes of Escherichia coli, Shigella spp., and Salmonella enterica , 2013, Journal of bacteriology.

[41]  Fabiano L. Thompson,et al.  Genomic Taxonomy of the Genus Prochlorococcus , 2013, Microbial Ecology.

[42]  D. Raoult,et al.  A polyphasic strategy incorporating genomic data for the taxonomic description of novel bacterial species. , 2014, International journal of systematic and evolutionary microbiology.

[43]  Han Zheng,et al.  Whole-genome sequence comparison as a method for improving bacterial species definition. , 2014, The Journal of general and applied microbiology.

[44]  Lisa R. Moore,et al.  Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus , 2014, Scientific Data.

[45]  J. Blanchard,et al.  Strong Genome-Wide Selection Early in the Evolution of Prochlorococcus Resulted in a Reduced Genome through the Loss of a Large Number of Small Effect Genes , 2014, PloS one.

[46]  W. W. Lathem,et al.  Early emergence of Yersinia pestis as a severe respiratory pathogen , 2015, Nature Communications.

[47]  Didier Raoult,et al.  Cautionary tale of using 16S rRNA gene sequence similarity values in identification of human-associated bacterial species. , 2015, International journal of systematic and evolutionary microbiology.

[48]  S. Blum,et al.  Streptococcus equi subspecies equi in horses in Israel: seroprevalence and strain types , 2016, Veterinary Record Open.

[49]  Thomas Rattei,et al.  High definition for systems biology of microbial communities: metagenomics gets genome-centric and strain-resolved. , 2016, Current opinion in biotechnology.

[50]  Peer Bork,et al.  Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees , 2016, Nucleic Acids Res..

[51]  R. Sharma,et al.  Molecular characterization of virulence genes of Streptococcus equi subsp. equi and Streptococcus equi subsp. zooepidemicus in equines , 2016, Veterinary world.

[52]  F. Thompson,et al.  Comparative genomics of Synechococcus and proposal of the new genus Parasynechococcus , 2016, PeerJ.

[53]  Haiwei Luo,et al.  Excess of non-conservative amino acid changes in marine bacterioplankton lineages with reduced genomes , 2017, Nature Microbiology.

[54]  Filip Husník,et al.  Legionella Becoming a Mutualist: Adaptive Processes Shaping the Genome of Symbiont in the Louse Polyplax serrata , 2017, bioRxiv.

[55]  H. Ochman,et al.  Biological Species Are Universal across Life’s Domains , 2017, Genome biology and evolution.

[56]  D. Mende,et al.  Environmental drivers of a microbial genomic transition zone in the ocean’s interior , 2017, Nature Microbiology.

[57]  C. Lian,et al.  Description of Bacillus kexueae sp. nov. and Bacillus manusensis sp. nov., isolated from hydrothermal sediments. , 2018, International journal of systematic and evolutionary microbiology.

[58]  R. Müller,et al.  Nannocystis konarekensis sp. nov., a novel myxobacterium from an Iranian desert. , 2018, International journal of systematic and evolutionary microbiology.

[59]  Greg,et al.  Paraphyletic species , 2022 .