Short template switch events in human evolution cause complex mutation patterns

Background. Resequencing efforts are uncovering the extent of genetic variation in humans and provide data to study the evolutionary processes shaping our genome. One recurring puzzle in both intra- and inter-species studies is the high frequency of complex mutations comprising multiple nearby base substitutions or insertion-deletions. We devised a generalized mutation model to study the role of template switch events in the origin of such mutation clusters. Results. Applied to the human genome, our model detects thousands of template switch events during the evolution of human and chimp from their common ancestor, and hundreds of events between two independently sequenced human genomes. While many of these are consistent with the inter-strand template switch mechanism proposed for bacteria, our model also identifies new types of mutations that create short inversions, some flanked by paired inverted repeats. This local template switch process creates numerous complex mutation patterns, including secondary structures, and explains multi-nucleotide mutations and compensatory substitutions without invoking positive selection. Detection of these complex mutations with current resequencing methodologies is difficult and we find many erroneous variant annotations in human reference data. Conclusions. Previously unexplained short template switch events account for a large number of complex mutation patterns in human evolution, without invoking complicated and speculative mechanisms or implausible coincidence. We show that clustered sequence differences are challenging for mapping and variant calling methods. Template switch events such as those we have uncovered may have been neglected as an explanation for complex mutations because of biases in commonly used analyses. Incorporation of our model into analysis pipelines will lead to improved understanding of genome variation and evolution.

[1]  A. Eyre-Walker,et al.  The Excess of Small Inverted Repeats in Prokaryotes , 2008, Journal of Molecular Evolution.

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  Y. Zhang,et al.  In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features , 2013, Nature.

[4]  P. Stenson,et al.  Intrachromosomal serial replication slippage in trans gives rise to diverse genomic rearrangements involving inversions , 2005, Human mutation.

[5]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[6]  D. Hillis,et al.  Ribosomal RNA secondary structure: compensatory mutations and implications for phylogenetic analysis. , 1993, Molecular biology and evolution.

[7]  R. Hudson,et al.  Single-nucleotide mutation rate increases close to insertions/deletions in eukaryotes , 2008, Nature.

[8]  P. Sharp,et al.  Evidence for a high frequency of simultaneous double-nucleotide substitutions. , 2000, Science.

[9]  Matthew W. Hahn,et al.  Pervasive Multinucleotide Mutational Events in Eukaryotes , 2011, Current Biology.

[10]  S. Lovett,et al.  Cis and trans-acting effects on a mutational hotspot involving a replication template switch. , 2006, Journal of molecular biology.

[11]  Alejandro A. Schäffer,et al.  A Fast and Symmetric DUST Implementation to Mask Low-Complexity DNA Sequences , 2006, J. Comput. Biol..

[12]  R. Nielsen,et al.  Error-prone polymerase activity causes multinucleotide mutations in humans , 2013, Genome research.

[13]  E. Birney,et al.  Genome-wide nucleotide-level mammalian ancestor reconstruction. , 2008, Genome research.

[14]  George Iliakis,et al.  Break-Induced Replication Repair of Damaged Forks Induces Genomic Duplications in Human Cells , 2014, Science.

[15]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[16]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[17]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[18]  Simon Whelan,et al.  Estimating the Frequency of Events That Cause Multiple-Nucleotide Changes , 2004, Genetics.

[19]  Qiangfeng Cliff Zhang,et al.  Landscape and variation of RNA secondary structure across the human transcriptome , 2014, Nature.

[20]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[21]  Manolis Kellis,et al.  Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo , 2013, Nature.

[22]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[23]  Georgii A. Bazykin,et al.  Positive selection at sites of multiple amino acid replacements since rat–mouse divergence , 2004, Nature.

[24]  R. Durbin,et al.  Inference of human population history from individual whole-genome sequences. , 2011, Nature.

[25]  Robert S. Harris,et al.  Improved pairwise alignment of genomic dna , 2007 .

[26]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[27]  Fyodor A. Kondrashov,et al.  Compensatory evolution in mitochondrial tRNAs navigates valleys of low fitness , 2010, Nature.

[28]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[29]  J. Lupski,et al.  Mechanisms of change in gene copy number , 2009, Nature Reviews Genetics.

[30]  J. Lupski,et al.  A Microhomology-Mediated Break-Induced Replication Model for the Origin of Human Copy Number Variation , 2009, PLoS genetics.

[31]  L. S. Ripley,et al.  Frameshift mutation: determinants of specificity. , 1990, Annual review of genetics.

[32]  J. Lupski,et al.  A DNA Replication Mechanism for Generating Nonrecurrent Rearrangements Associated with Genomic Disorders , 2007, Cell.

[33]  E. Tillier,et al.  High apparent rate of simultaneous compensatory base-pair substitutions in ribosomal RNA. , 1998, Genetics.