The origin, evolution, and functional impact of short insertion–deletion variants identified in 179 human genomes

Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%-48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.

[1]  J. Lupski,et al.  The DNA replication FoSTeS/MMBIR mechanism can generate genomic, genic and exonic complex rearrangements in humans , 2009, Nature Genetics.

[2]  E. Eichler,et al.  A Human Genome Structural Variation Sequencing Resource Reveals Insights into Mutational Mechanisms , 2010, Cell.

[3]  Alexey S Kondrashov,et al.  Context of deletions and insertions in human coding sequences , 2004, Human mutation.

[4]  R. Sibly,et al.  Likelihood-based estimation of microsatellite mutation rates. , 2003, Genetics.

[5]  R. Guigó,et al.  Transcriptome genetics using second generation sequencing in a Caucasian population , 2010, Nature.

[6]  Laurent Duret,et al.  Biased gene conversion and the evolution of mammalian genomic landscapes. , 2009, Annual review of genomics and human genetics.

[7]  G. Chu Double Strand Break Repair* , 1997, The Journal of Biological Chemistry.

[8]  Martin Goodson,et al.  Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. , 2011, Genome research.

[9]  Thomas M. Keane,et al.  Mouse genomic variation and its effect on phenotypes and gene regulation , 2011, Nature.

[10]  Kateryna D. Makova,et al.  A Macaque's-Eye View of Human Insertions and Deletions: Differences in Mechanisms , 2007, PLoS Comput. Biol..

[11]  M. Nishizawa,et al.  On Slippage-Like Mutation Dynamics Within Genes: A Study of Pseudogenes and 3′UTRs , 2005, Journal of Molecular Evolution.

[12]  Ryan E. Mills,et al.  Natural genetic variation caused by small insertions and deletions in the human genome. , 2011, Genome research.

[13]  J. Strassmann,et al.  Insertions, substitutions, and the origin of microsatellites. , 2000, Genetical research.

[14]  C. Schlötterer Evolutionary dynamics of microsatellite DNA , 2000, Chromosoma.

[15]  P. Stankiewicz,et al.  Structural variation in the human genome and its role in disease. , 2010, Annual review of medicine.

[16]  S. Schaeffer,et al.  Natural selection and the frequency distributions of "silent" DNA polymorphism in Drosophila. , 1997, Genetics.

[17]  Martin S. Taylor,et al.  Occurrence and consequences of coding sequence insertions and deletions in Mammalian genomes. , 2004, Genome research.

[18]  T. Petes,et al.  Microsatellite instability in yeast: dependence on the length of the microsatellite. , 1997, Genetics.

[19]  S. Tyekucheva,et al.  The genome-wide determinants of human and chimpanzee microsatellite evolution. , 2007, Genome research.

[20]  J. Lupski,et al.  A DNA Replication Mechanism for Generating Nonrecurrent Rearrangements Associated with Genomic Disorders , 2007, Cell.

[21]  Deborah A Nickerson,et al.  Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate genes. , 2005, Human molecular genetics.

[22]  Karen Usdin,et al.  The biological effects of simple tandem repeats: lessons from the repeat expansion diseases. , 2008, Genome research.

[23]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[24]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[25]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[26]  J. Lupski,et al.  A Microhomology-Mediated Break-Induced Replication Model for the Origin of Human Copy Number Variation , 2009, PLoS genetics.

[27]  T. Kunkel,et al.  Mechanism of a genetic glissando: structural biology of indel mutations. , 2006, Trends in biochemical sciences.

[28]  A. Jeffreys,et al.  Comparative sequence analysis of human minisatellites showing meiotic repeat instability. , 1999, Genome research.

[29]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[30]  D. Falush,et al.  A threshold size for microsatellite expansion. , 1998, Molecular biology and evolution.

[31]  Peter Donnelly,et al.  A common sequence motif associated with recombination hot spots and genome instability in humans , 2008, Nature Genetics.

[32]  A. Clark,et al.  Local rates of recombination are positively correlated with GC content in the human genome. , 2001, Molecular biology and evolution.

[33]  Chris P. Ponting,et al.  Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model , 2005, PLoS Comput. Biol..

[34]  M. Inouye,et al.  Frameshift mutations and the genetic code. This paper is dedicated to Professor Theodosius Dobzhansky on the occasion of his 66th birthday. , 1966, Cold Spring Harbor symposia on quantitative biology.

[35]  Ryan E. Mills,et al.  Small insertions and deletions (INDELs) in human genomes. , 2010, Human molecular genetics.

[36]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[37]  N. Freimer,et al.  Microsatellite allele frequencies in humans and chimpanzees, with implications for constraints on allele size. , 1995, Molecular biology and evolution.

[38]  G. Gutman,et al.  Slipped-strand mispairing: a major mechanism for DNA sequence evolution. , 1987, Molecular biology and evolution.

[39]  D. MacArthur,et al.  Loss-of-function variants in the genomes of healthy humans. , 2010, Human molecular genetics.

[40]  Matthew Mort,et al.  The Human Gene Mutation Database: providing a comprehensive central mutation database for molecular diagnostics and personalised genomics , 2009, Human Genomics.

[41]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[42]  M. Adams,et al.  Formation of deletions during double-strand break repair in Drosophila DmBlm mutants occurs after strand invasion. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[43]  David Haussler,et al.  Ongoing GC-Biased Evolution Is Widespread in the Human Genome and Enriched Near Recombination Hot Spots , 2011, Genome biology and evolution.

[44]  C. Harris,et al.  Deletions and insertions in the p53 tumor suppressor gene in human cancers: confirmation of the DNA polymerase slippage/misalignment model. , 1996, Cancer research.

[45]  R. Durbin,et al.  Dindel: accurate indel calls from short-read data. , 2011, Genome research.

[46]  S. Lovett,et al.  A novel mutational hotspot in a natural quasipalindrome in Escherichia coli. , 2000, Journal of molecular biology.

[47]  H. Ellegren Microsatellites: simple sequences with complex evolution , 2004, Nature Reviews Genetics.

[48]  C. Desmaze,et al.  Impact of the KU80 pathway on NHEJ-induced genome rearrangements in mammalian cells. , 2004, Molecular cell.

[49]  Kateryna D. Makova,et al.  Distinct Mutational Behaviors Differentiate Short Tandem Repeats from Microsatellites in the Human Genome , 2012, Genome biology and evolution.