Evolutionary pressures on simple sequence repeats in prokaryotic coding regions

Simple sequence repeats (SSRs) are indel mutational hotspots in genomes. In prokaryotes, SSR loci can cause phase variation, a microbial survival strategy that relies on stochastic, reversible on–off switching of gene activity. By analyzing multiple strains of 42 fully sequenced prokaryotic species, we measure the relative variability and density distribution of SSRs in coding regions. We demonstrate that repeat type strongly influences indel mutation rates, and that the most mutable types are most strongly avoided across genomes. We thoroughly characterize SSR density and variability as a function of N→C position along protein sequences. Using codon-shuffling algorithms that preserve amino acid sequence, we assess evolutionary pressures on SSRs. We find that coding sequences suppress repeats in the middle of proteins, and enrich repeats near termini, yielding U-shaped SSR density curves. We show that for many species this characteristic shape can be attributed to purely biophysical constraints of protein structure. In multiple cases, however, particularly in certain pathogenic bacteria, we observe over enrichment of SSRs near protein N-termini significantly beyond expectation based on structural constraints. This increases the probability that frameshifts result in non-functional proteins, revealing that these species may evolutionarily tune SSR positions in coding regions to facilitate phase variation.

[1]  D. Sagher,et al.  Stabilization of the intermediate in frameshift mutation. , 1999, Mutation research.

[2]  Tanja Popovic,et al.  Mutator clones of Neisseria meningitidis in epidemic serogroup A disease , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  D. Hood,et al.  Microsatellite instability regulates transcription factor binding and gene expression. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[4]  M. Thattai,et al.  Stochastic Gene Expression in Fluctuating Environments , 2004, Genetics.

[5]  Hsien-Da Huang,et al.  Clusters of Nucleotide Substitutions and Insertion/Deletion Mutations Are Associated with Repeat Sequences , 2011, PLoS biology.

[6]  M. W. van der Woude,et al.  Phase and Antigenic Variation in Bacteria , 2004, Clinical Microbiology Reviews.

[7]  Richard Moxon,et al.  Bacterial contingency loci: the role of simple sequence DNA repeats in bacterial adaptation. , 2006, Annual review of genetics.

[8]  P. Argos,et al.  Analysis of insertions/deletions in protein structures. , 1992, Journal of molecular biology.

[9]  R. Hudson,et al.  Single-nucleotide mutation rate increases close to insertions/deletions in eukaryotes , 2008, Nature.

[10]  A. Laio,et al.  Are structural biases at protein termini a signature of vectorial folding? , 2005, Proteins.

[11]  C. Wills,et al.  Abundant microsatellite polymorphism in Saccharomyces cerevisiae, and the different distributions of microsatellites in eight prokaryotes and S. cerevisiae, result from strong mutation pressures and a variety of selective forces. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[12]  H. Ochman,et al.  Selection on the genic location of disruptive elements. , 2007, Trends in genetics : TIG.

[13]  Y. Kashi,et al.  Simple sequence repeats as advantageous mutators in evolution. , 2006, Trends in genetics : TIG.

[14]  B. Matthews,et al.  Protein structural plasticity exemplified by insertion and deletion mutants in T4 lysozyme , 1996, Protein science : a publication of the Protein Society.

[15]  T. Baldwin,et al.  Implications of N and C-terminal proximity for protein folding. , 1996, Journal of molecular biology.

[16]  John Sondek,et al.  Accommodation of single amino acid insertions by the native state of staphylococcal nuclease , 1990, Proteins.

[17]  T. Kunkel,et al.  Exonucleolytic proofreading during replication of repetitive DNA. , 1996, Biochemistry.

[18]  Jun-tao Guo,et al.  Systematic analysis of short internal indels and their impact on protein folding , 2010, BMC Structural Biology.

[19]  R. Losick,et al.  Genes governing swarming in Bacillus subtilis and evidence for a phase variation mechanism controlling surface motility , 2004, Molecular microbiology.

[20]  D. Petrov,et al.  Evidence That Mutation Is Universally Biased towards AT in Bacteria , 2010, PLoS genetics.

[21]  Accommodation of amino acid insertions in an alpha-helix of T4 lysozyme. Structural and thermodynamic analysis. , 1994, Journal of molecular biology.

[22]  S. Leibler,et al.  Phenotypic Diversity, Population Growth, and Information in Fluctuating Environments , 2005, Science.

[23]  T. Ohta,et al.  On some principles governing molecular evolution. , 1974, Proceedings of the National Academy of Sciences of the United States of America.

[24]  B. Harfe,et al.  Base Composition of Mononucleotide Runs Affects DNA Polymerase Slippage and Removal of Frameshift Intermediates by Mismatch Repair in Saccharomyces cerevisiae , 2002, Molecular and Cellular Biology.

[25]  L. Mirny,et al.  Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. , 1999, Journal of molecular biology.

[26]  C. Bayliss Determinants of phase variation rate and the fitness implications of differing rates for bacterial pathogens and commensals. , 2009, FEMS microbiology reviews.

[27]  A. Arkin,et al.  Diversity in times of adversity: probabilistic strategies in microbial survival games. , 2005, Journal of theoretical biology.

[28]  Matthieu Legendre,et al.  Unstable Tandem Repeats in Promoters Confer Transcriptional Evolvability , 2009, Science.

[29]  S. Leibler,et al.  Bacterial Persistence , 2005, Genetics.

[30]  A. L. Koch Catastrophe and What To Do About It If You Are a Bacterium: The Importance of Frameshift Mutants , 2004, Critical reviews in microbiology.

[31]  Tingting Gu,et al.  Avoidance of Long Mononucleotide Repeats in Codon Pair Usage , 2010, Genetics.

[32]  Larry J Young,et al.  Microsatellite Instability Generates Diversity in Brain and Sociobehavioral Traits , 2005, Science.

[33]  H. Garner,et al.  Molecular origins of rapid and continuous morphological evolution , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34]  L. Chao,et al.  DNA Sequences Shaped by Selection for Stability , 2006, PLoS genetics.

[35]  M. Touchon,et al.  Genesis, effects and fates of repeats in prokaryotic genomes. , 2009, FEMS microbiology reviews.

[36]  Y. Iwasa,et al.  Optimal Mixed Strategies in Stochastic Environments , 1995 .

[37]  M. Wiedmann,et al.  Homopolymeric tracts represent a general regulatory mechanism in prokaryotes , 2010, BMC Genomics.

[38]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[39]  Philipp W Messer,et al.  DNA indels in coding regions reveal selective constraints on protein evolution in the human lineage , 2007, BMC Evolutionary Biology.

[40]  Ron Unger,et al.  A tale of two tails: why are terminal residues of proteins exposed? , 2007, Bioinform..

[41]  S. Englander,et al.  The N-terminal to C-terminal motif in protein folding and function. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[42]  K. Verstrepen,et al.  Timescales of Genetic and Epigenetic Inheritance , 2007, Cell.

[43]  S. Normark,et al.  Phase variation of gonococcal pili by frameshift mutation in pilC, a novel gene for pilus assembly. , 1991, The EMBO journal.

[44]  A. Mironov,et al.  Evolution of Prokaryotic Genes by Shift of Stop Codons , 2011, Journal of Molecular Evolution.

[45]  M. Lachmann,et al.  The inheritance of phenotypes: an adaptation to fluctuating environments. , 1996, Journal of theoretical biology.

[46]  Kateryna D. Makova,et al.  What Is a Microsatellite: A Computational and Experimental Definition Based upon Repeat Mutational Behavior at A/T and GT/AC Repeats , 2010, Genome biology and evolution.

[47]  K. Wise,et al.  Localized frameshift mutation generates selective, high-frequency phase variation of a surface lipoprotein encoded by a mycoplasma ABC transporter operon , 1997, Journal of bacteriology.

[48]  Jan Mrázek,et al.  Simple sequence repeats in prokaryotic genomes , 2007, Proceedings of the National Academy of Sciences.

[49]  Alex van Belkum,et al.  Short-Sequence DNA Repeats in Prokaryotic Genomes , 1998, Microbiology and Molecular Biology Reviews.

[50]  Gyan Bhanot,et al.  Patterns of Evolution and Host Gene Mimicry in Influenza and Other RNA Viruses , 2008, PLoS pathogens.

[51]  J M Thornton,et al.  Amino and carboxy-terminal regions in globular proteins. , 1983, Journal of molecular biology.

[52]  Simon F. Park,et al.  Localized Reversible Frameshift Mutation in theflhA Gene Confers Phase Variability to Flagellin Gene Expression in Campylobacter coli , 2000, Journal of bacteriology.

[53]  Eran Segal,et al.  Overlapping codes within protein-coding sequences. , 2010, Genome research.

[54]  J. Taylor,et al.  Repeat expansion disease: progress and puzzles in disease pathogenesis , 2010, Nature Reviews Genetics.

[55]  E. Nevo,et al.  Microsatellites within genes: structure, function, and evolution. , 2004, Molecular biology and evolution.