Flexibility and Symmetry of Prokaryotic Genome Rearrangement Reveal Lineage-Associated Core-Gene-Defined Genome Organizational Frameworks

ABSTRACT The prokaryotic pangenome partitions genes into core and dispensable genes. The order of core genes, albeit assumed to be stable under selection in general, is frequently interrupted by horizontal gene transfer and rearrangement, but how a core-gene-defined genome maintains its stability or flexibility remains to be investigated. Based on data from 30 species, including 425 genomes from six phyla, we grouped core genes into syntenic blocks in the context of a pangenome according to their stability across multiple isolates. A subset of the core genes, often species specific and lineage associated, formed a core-gene-defined genome organizational framework (cGOF). Such cGOFs are either single segmental (one-third of the species analyzed) or multisegmental (the rest). Multisegment cGOFs were further classified into symmetric or asymmetric according to segment orientations toward the origin-terminus axis. The cGOFs in Gram-positive species are exclusively symmetric and often reversible in orientation, as opposed to those of the Gram-negative bacteria, which are all asymmetric and irreversible. Meanwhile, all species showing strong strand-biased gene distribution contain symmetric cGOFs and often specific DnaE (α subunit of DNA polymerase III) isoforms. Furthermore, functional evaluations revealed that cGOF genes are hub associated with regard to cellular activities, and the stability of cGOF provides efficient indexes for scaffold orientation as demonstrated by assembling virtual and empirical genome drafts. cGOFs show species specificity, and the symmetry of multisegmental cGOFs is conserved among taxa and constrained by DNA polymerase-centric strand-biased gene distribution. The definition of species-specific cGOFs provides powerful guidance for genome assembly and other structure-based analysis. IMPORTANCE Prokaryotic genomes are frequently interrupted by horizontal gene transfer (HGT) and rearrangement. To know whether there is a set of genes not only conserved in position among isolates but also functionally essential for a given species and to further evaluate the stability or flexibility of such genome structures across lineages are of importance. Based on a large number of multi-isolate pangenomic data, our analysis reveals that a subset of core genes is organized into a core-gene-defined genome organizational framework, or cGOF. Furthermore, the lineage-associated cGOFs among Gram-positive and Gram-negative bacteria behave differently: the former, composed of 2 to 4 segments, have their fragments symmetrically rearranged around the origin-terminus axis, whereas the latter show more complex segmentation and are partitioned asymmetrically into chromosomal structures. The definition of cGOFs provides new insights into prokaryotic genome organization and efficient guidance for genome assembly and analysis. Prokaryotic genomes are frequently interrupted by horizontal gene transfer (HGT) and rearrangement. To know whether there is a set of genes not only conserved in position among isolates but also functionally essential for a given species and to further evaluate the stability or flexibility of such genome structures across lineages are of importance. Based on a large number of multi-isolate pangenomic data, our analysis reveals that a subset of core genes is organized into a core-gene-defined genome organizational framework, or cGOF. Furthermore, the lineage-associated cGOFs among Gram-positive and Gram-negative bacteria behave differently: the former, composed of 2 to 4 segments, have their fragments symmetrically rearranged around the origin-terminus axis, whereas the latter show more complex segmentation and are partitioned asymmetrically into chromosomal structures. The definition of cGOFs provides new insights into prokaryotic genome organization and efficient guidance for genome assembly and analysis.

[1]  Eduardo P C Rocha,et al.  The replication-related organization of bacterial genomes. , 2004, Microbiology.

[2]  Andrew Travers,et al.  Gene order and chromosome dynamics coordinate spatiotemporal gene expression during the bacterial growth cycle , 2011, Proceedings of the National Academy of Sciences.

[3]  Anat Kreimer,et al.  The evolution of modularity in bacterial metabolic networks , 2008, Proceedings of the National Academy of Sciences.

[4]  Sagi Snir,et al.  Phylo SI: a new genome-wide approach for prokaryotic phylogeny , 2013, Nucleic acids research.

[5]  N. Campo,et al.  Chromosomal constraints in Gram‐positive bacteria revealed by artificial inversions , 2004, Molecular microbiology.

[6]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[7]  F. Rodríguez-Valera,et al.  The bacterial pan-genome:a new paradigm in microbiology. , 2010, International microbiology : the official journal of the Spanish Society for Microbiology.

[8]  Peer Bork,et al.  Genome-Wide Experimental Determination of Barriers to Horizontal Gene Transfer , 2007, Science.

[9]  R. Beever,et al.  Comparison of the complete genome sequence of two closely related isolates of ‘Candidatus Phytoplasma australiense’ reveals genome plasticity , 2013, BMC Genomics.

[10]  D. Leach,et al.  Bacterial Genome Instability , 2014, Microbiology and Molecular Reviews.

[11]  E. Rocha Is there a role for replication fork asymmetry in the distribution of genes in bacterial genomes? , 2002, Trends in microbiology.

[12]  Pietro Liò,et al.  Short and long-term genome stability analysis of prokaryotic genomes , 2013, BMC Genomics.

[13]  Daniel B. Sloan,et al.  The Evolution of Genomic Instability in the Obligate Endosymbionts of Whiteflies , 2013, Genome biology and evolution.

[14]  Jun Yu,et al.  Comparative Analysis of Eubacterial DNA Polymerase III Alpha Subunits , 2007, Genom. Proteom. Bioinform..

[15]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[16]  H. Matsuda,et al.  Biased biological functions of horizontally transferred genes in prokaryotic genomes , 2004, Nature Genetics.

[17]  A. Danchin,et al.  Organised Genome Dynamics in the Escherichia coli Species Results in Highly Diverse Adaptive Paths , 2009, PLoS genetics.

[18]  E. Feil Small change: keeping pace with microevolution , 2004, Nature Reviews Microbiology.

[19]  O. Espéli,et al.  Chromosome Structuring Limits Genome Plasticity in Escherichia coli , 2007, PLoS genetics.

[20]  M. Lizotte-Waniewski,et al.  Population genomics and the bacterial species concept. , 2009, Methods in molecular biology.

[21]  S. Salzberg,et al.  Evidence for symmetric chromosomal inversions around the replication origin in bacteria , 2000, Genome Biology.

[22]  O. Sliusarenko,et al.  Spatial organization of the flow of genetic information in bacteria , 2010, Nature.

[23]  Temple F. Smith,et al.  The origin and evolution of the ribosome , 2008, Biology Direct.

[24]  James J. Davis,et al.  Similarity of genes horizontally acquired by Escherichia coli and Salmonella enterica is evidence of a supraspecies pangenome , 2011, Proceedings of the National Academy of Sciences.

[25]  L. Cui,et al.  Coordinated phenotype switching with large-scale chromosome flip-flop inversion observed in bacteria , 2012, Proceedings of the National Academy of Sciences.

[26]  E. Cox,et al.  Gene location and DNA density determine transcription factor distributions in Escherichia coli , 2012, Molecular systems biology.

[27]  A. Ferré-D’Amaré Use of a coenzyme by the glmS ribozyme-riboswitch suggests primordial expansion of RNA chemistry by small molecules , 2011, Philosophical Transactions of the Royal Society B: Biological Sciences.

[28]  B. Snel,et al.  Toward Automatic Reconstruction of a Highly Resolved Tree of Life , 2006, Science.

[29]  Eduardo P C Rocha,et al.  Gene essentiality determines chromosome organisation in bacteria. , 2003, Nucleic acids research.

[30]  Asa Ben-Hur,et al.  Multiple instance learning of Calmodulin binding sites , 2012, Bioinform..

[31]  Jun Li,et al.  Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance , 2012, BMC Bioinformatics.

[32]  C. Pál,et al.  Adaptive evolution of bacterial metabolic networks by horizontal gene transfer , 2005, Nature Genetics.

[33]  I. Miklós,et al.  Dynamics of Genome Rearrangement in Bacterial Populations , 2008, PLoS genetics.

[34]  Songnian Hu,et al.  On the molecular mechanism of GC content variation among eubacterial genomes , 2012, Biology Direct.

[35]  Jun Yu,et al.  PGAP: pan-genomes analysis pipeline , 2011, Bioinform..