Complete genome sequence of DSM 30083T, the type strain (U5/41T) of Escherichia coli, and a proposal for delineating subspecies in microbial taxonomy

Although Escherichia coli is the most widely studied bacterial model organism and often considered to be the model bacterium per se, its type strain was until now forgotten from microbial genomics. As a part of the GenomicEncyclopedia ofBacteria andArchaea project, we here describe the features of E. coli DSM 30083T together with its genome sequence and annotation as well as novel aspects of its phenotype. The 5,038,133 bp containing genome sequence includes 4,762 protein-coding genes and 175 RNA genes as well as a single plasmid. Affiliation of a set of 250 genome-sequenced E. coli strains, Shigella and outgroup strains to the type strain of E. coli was investigated using digital DNA:DNA-hybridization (dDDH) similarities and differences in genomic G+C content. As in the majority of previous studies, results show Shigella spp. embedded within E. coli and in most cases forming a single subgroup of it. Phylogenomic trees also recover the proposed E. coli phylotypes as monophyla with minor exceptions and place DSM 30083T in phylotype B2 with E. coli S88 as its closest neighbor. The widely used lab strain K-12 is not only genomically but also physiologically strongly different from the type strain. The phylotypes do not express a uniform level of character divergence as measured using dDDH, however, thus an alternative arrangement is proposed and discussed in the context of bacterial subspecies. Analyses of the genome sequences of a large number of E. coli strains and of strains from > 100 other bacterial genera indicate a value of 79-80% dDDH as the most promising threshold for delineating subspecies, which in turn suggests the presence of five subspecies within E. coli.

[1]  I-Min A. Chen,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[2]  R. Welch The Genus Escherichia , 2006 .

[3]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[4]  C. Rizzello,et al.  Exploitation of the health-promoting and sensory properties of organic pomegranate (Punica granatum L.) juice through lactic acid fermentation. , 2013, International journal of food microbiology.

[5]  F. Rainey,et al.  Hydrotalea sandarakina sp. nov., isolated from a hot spring runoff, and emended descriptions of the genus Hydrotalea and the species Hydrotalea flava. , 2012, International journal of systematic and evolutionary microbiology.

[6]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[7]  Anton Güntsch,et al.  The DNA bank network: the start from a german initiative. , 2011, Biopreservation and biobanking.

[8]  A. Kuzminov The chromosome cycle of prokaryotes , 2013, Molecular microbiology.

[9]  H Kishino,et al.  Freeing phylogenies from artifacts of alignment. , 1992, Molecular biology and evolution.

[10]  M. Bramkamp,et al.  Cell division in Corynebacterineae , 2014, Front. Microbiol..

[11]  Natalia N. Ivanova,et al.  The DOE-JGI Standard Operating Procedure for the Annotations of Microbial Genomes , 2009, Standards in genomic sciences.

[12]  K. Williams,et al.  Proposal for a new class within the phylum Proteobacteria, Acidithiobacillia classis nov., with the type order Acidithiobacillales, and emended description of the class Gammaproteobacteria. , 2013, International journal of systematic and evolutionary microbiology.

[13]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[14]  Guocheng Du,et al.  Small RNA regulators in bacteria: powerful tools for metabolic engineering and synthetic biology , 2014, Applied Microbiology and Biotechnology.

[15]  Zhao Xu,et al.  Shigella Strains Are Not Clones of Escherichia coli but Sister Species in the Genus Escherichia , 2012, Genom. Proteom. Bioinform..

[16]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[17]  T. Hansen Bergey's Manual of Systematic Bacteriology , 2005 .

[18]  A. Chalmers,et al.  Manual of Tropical Medicine 3rd Edition. , 1919 .

[19]  Tyson A. Clark,et al.  Comparative genomics of enterohemorrhagic Escherichia coli O145:H28 demonstrates a common evolutionary lineage with Escherichia coli O157:H7 , 2014, BMC Genomics.

[20]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[21]  G. Garrity,et al.  Gammaproteobacteria class. nov. , 2015 .

[22]  Olivier Gascuel,et al.  Fast and Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution Principle , 2002, J. Comput. Biol..

[23]  Chris F. Taylor,et al.  The minimum information about a genome sequence (MIGS) specification , 2008, Nature Biotechnology.

[24]  O. Kandler,et al.  Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Lynne A. Goodwin,et al.  The Genome Sequence of Methanohalophilus mahii SLPT Reveals Differences in the Energy Metabolism among Members of the Methanosarcinaceae Inhabiting Freshwater and Saline Environments , 2010, Archaea.

[26]  Hans-Peter Klenk,et al.  Visualization and Curve-Parameter Estimation Strategies for Efficient Exploration of Phenotype Microarray Kinetics , 2012, PloS one.

[27]  G. Garrity,et al.  Proteobacteria phyl. nov. , 2015 .

[28]  George M. Garrity,et al.  Standards in Genomic Sciences , 2009, Standards in genomic sciences.

[29]  E. Denamur,et al.  Characterization of the cryptic Escherichia lineages: rapid identification and prevalence. , 2011, Environmental microbiology.

[30]  I-Min A. Chen,et al.  IMG ER: a system for microbial genome annotation expert review and curation , 2009, Bioinform..

[31]  H. Klenk,et al.  Phylogeny-driven target selection for large-scale genome-sequencing (and other) projects , 2013, Standards in genomic sciences.

[32]  I S Roberts,et al.  Structure, assembly and regulation of expression of capsules in Escherichia coli , 1999, Molecular microbiology.

[33]  H. Klenk,et al.  En route to a genome-based classification of Archaea and Bacteria? , 2010, Systematic and applied microbiology.

[34]  G. Garrity Bergey’s Manual® of Systematic Bacteriology , 2012, Springer New York.

[35]  Pha Sneath,et al.  International code of nomenclature of bacteria (1990 revision). , 1992 .

[36]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[37]  Natalia N. Ivanova,et al.  Novel Insights into the Diversity of Catabolic Metabolism from Ten Haloarchaeal Genomes , 2011, PloS one.

[38]  Erko Stackebrandt,et al.  The Prokaryotes : Actinobacteria , 2014 .

[39]  N. Kyrpides,et al.  Complete genome sequence of Isosphaera pallida type strain (IS1BT) , 2011, Standards in genomic sciences.

[40]  Peer Bork,et al.  Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy , 2011, Nucleic Acids Res..

[41]  Lawrence G. Wayne,et al.  International Committee on Systematic Bacteriology: Announcement of the Report of the Ad Hoc Committee on Reconciliation of Approaches to Bacterial Systematics , 1988 .

[42]  I. Ørskov,et al.  2 Serotyping of Escherichia coli , 1984 .

[43]  Nikos Kyrpides,et al.  CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats , 2007, BMC Bioinformatics.

[44]  A. Farnleitner,et al.  Simultaneous Detection and Differentiation ofEscherichia coli Populations from Environmental Freshwaters by Means of Sequence Variations in a Fragment of the β-d-Glucuronidase Gene , 2000, Applied and Environmental Microbiology.

[45]  P. Xu,et al.  Acetoin Metabolism in Bacteria , 2007, Critical reviews in microbiology.

[46]  G. Cochrane,et al.  The Genomic Standards Consortium , 2011, PLoS biology.

[47]  P. Sneath,et al.  Approved lists of bacterial names. , 1980, The Medical journal of Australia.

[48]  Harry L. T. Mobley,et al.  Pathogenic Escherichia coli , 2004, Nature Reviews Microbiology.

[49]  Pablo A. Goloboff,et al.  Parsimony, likelihood, and simplicity , 2003 .

[50]  J. Lengeler,et al.  Pathways for the utilization of N‐acetyl‐galactosamine and galactosamine in Escherichia coli , 2000, Molecular microbiology.

[51]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[52]  J. Swings,et al.  Escherichia albertii sp. nov., a diarrhoeagenic species isolated from stool specimens of Bangladeshi children. , 2003, International journal of systematic and evolutionary microbiology.

[53]  Brian J Tindall,et al.  Valid publication of names of prokaryotes according to the rules of nomenclature: past history and current practice. , 2006, International journal of systematic and evolutionary microbiology.

[54]  Anupam Chowdhury,et al.  Systems metabolic engineering design: Fatty acid production as an emerging case study , 2014, Biotechnology and bioengineering.

[55]  B. Bond-Watts,et al.  Production of advanced biofuels in engineered E. coli. , 2013, Current opinion in chemical biology.

[56]  S. Octavia,et al.  The Family Enterobacteriaceae , 2014 .

[57]  F. Kauffmann Zur Serologie Der Coli‐Gruppe , 2009 .

[58]  Lynne A. Goodwin,et al.  Complete genome sequence of the termite hindgut bacterium Spirochaeta coccoides type strain (SPN1T), reclassification in the genus Sphaerochaeta as Sphaerochaeta coccoides comb. nov. and emendations of the family Spirochaetaceae and the genus Sphaerochaeta , 2012, Standards in genomic sciences.

[59]  Olivier Poch,et al.  RASCAL: Rapid Scanning and Correction of Multiple Sequence Alignments , 2003, Bioinform..

[60]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[61]  H. Klenk,et al.  Complete genome sequence of the Phaeobacter gallaeciensis type strain CIP 105210T (= DSM 26640T = BS107T) , 2014, Standards in genomic sciences.

[62]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[63]  M. Donnenberg Escherichia coli : pathotypes and principles of pathogenesis , 2013 .

[64]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[65]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[66]  Alexander F. Auch,et al.  Highly parallelized inference of large genome‐based phylogenies , 2014, Concurr. Comput. Pract. Exp..

[67]  Germán L. Rosano,et al.  Recombinant protein expression in Escherichia coli: advances and challenges , 2014, Front. Microbiol..

[68]  S. Koser UTILIZATION OF THE SALTS OF ORGANIC ACIDS BY THE COLON-AEROGENES GROUP , 1923, Journal of bacteriology.

[69]  Hans-Peter Klenk,et al.  Digital DNA-DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison , 2010, Standards in genomic sciences.

[70]  Sp Lapage,et al.  International Code of Nomenclature of Bacteria , 1992 .

[71]  Peter Williams,et al.  IMG: the integrated microbial genomes database and comparative analysis system , 2011, Nucleic Acids Res..

[72]  W. Ludwig,et al.  Notes on the characterization of prokaryote strains for taxonomic purposes. , 2010, International journal of systematic and evolutionary microbiology.

[73]  Markus Göker,et al.  Molecular Taxonomy of Phytopathogenic Fungi: A Case Study in Peronospora , 2009, PloS one.

[74]  Robert C. Edgar,et al.  PILER-CR: Fast and accurate identification of CRISPR repeats , 2007, BMC Bioinformatics.

[75]  Wei Qian,et al.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. , 2000, Molecular biology and evolution.

[76]  A. von Haeseler,et al.  A phylogenomic approach to resolve the arthropod tree of life. , 2010, Molecular biology and evolution.

[77]  C. di Ilio,et al.  Escherichia coli in Europe: An Overview , 2013, International journal of environmental research and public health.

[78]  D. Rasko,et al.  Chapter 2 – Comparative genomics of pathogenic Escherichia coli , 2013 .

[79]  E. Denamur,et al.  The Clermont Escherichia coli phylo-typing method revisited: improvement of specificity and detection of new phylo-groups. , 2013, Environmental microbiology reports.

[80]  Ruiting Lan,et al.  Escherichia coli in disguise: molecular origins of Shigella. , 2002, Microbes and infection.

[81]  A. Danchin,et al.  Organised Genome Dynamics in the Escherichia coli Species Results in Highly Diverse Adaptive Paths , 2009, PLoS genetics.

[82]  M. Penttilä,et al.  Identification in Agrobacterium tumefaciens of the d-galacturonic acid dehydrogenase gene , 2010, Applied Microbiology and Biotechnology.

[83]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[84]  Alice C. McHardy,et al.  Alignment-Free Genome Tree Inference by Learning Group-Specific Distance Metrics , 2013, Genome biology and evolution.

[85]  H. Klenk,et al.  Codivergence of Mycoviruses with Their Hosts , 2011, PloS one.

[86]  I-Min A. Chen,et al.  The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata , 2011, Nucleic Acids Res..

[87]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[88]  T. Cebula,et al.  Antimicrobial Resistance-Conferring Plasmids with Similarity to Virulence Plasmids from Avian Pathogenic Escherichia coli Strains in Salmonella enterica Serovar Kentucky Isolates from Poultry , 2009, Applied and Environmental Microbiology.

[89]  O. Clermont,et al.  Rapid and Simple Determination of theEscherichia coli Phylogenetic Group , 2000, Applied and Environmental Microbiology.

[90]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[91]  Benjamin Hofner,et al.  opm: an R package for analysing OmniLog® phenotype microarray data , 2013, Bioinform..

[92]  R. Pukall,et al.  The discriminatory power of ribotyping as automatable technique for differentiation of bacteria. , 2013, Systematic and applied microbiology.

[93]  Carsten Friis,et al.  Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes , 2012, BMC Genomics.

[94]  Alexander F. Auch,et al.  Genome sequence-based species delimitation with confidence intervals and improved distance functions , 2013, BMC Bioinformatics.

[95]  D. Feingold,et al.  d-Glucaric Acid and Galactaric Acid Catabolism by Agrobacterium tumefaciens , 1970, Journal of bacteriology.

[96]  E. Stackebrandt,et al.  The Families Erysipelotrichaceae emend., Coprobacillaceae fam. nov., and Turicibacteraceae fam. nov. , 2014 .

[97]  Natalia N. Ivanova,et al.  A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea , 2009, Nature.

[98]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[99]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[100]  W. Topley,et al.  The Principles of Bacteriology and Immunity , 1937, The Indian Medical Gazette.

[101]  Theodor Escherich Die Darmbakterien des Säuglings und ihre Beziehungen zur Physiologie der Verdauung , 1886 .

[102]  Alexandros Stamatakis,et al.  How Many Bootstrap Replicates Are Necessary? , 2009, RECOMB.

[103]  D. Ussery,et al.  Comparison of 61 Sequenced Escherichia coli Genomes , 2010, Microbial Ecology.

[104]  Hans-Peter Klenk,et al.  Taxonomic use of DNA G+C content and DNA-DNA hybridization in the genomic age. , 2014, International journal of systematic and evolutionary microbiology.

[105]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[106]  Thomas S. Brettin,et al.  The Fast Changing Landscape of Sequencing Technologies and Their Impact on Microbial Genome Assemblies and Annotation , 2012, PloS one.

[107]  Sp Lapage,et al.  International Code of Nomenclature of Bacteria: Bacteriological Code, 1990 Revision , 1992 .