Functional constraint and small insertions and deletions in the ENCODE regions of the human genome

BackgroundWe describe the distribution of indels in the 44 Encyclopedia of DNA Elements (ENCODE) regions (about 1% of the human genome) and evaluate the potential contributions of small insertion and deletion polymorphisms (indels) to human genetic variation. We relate indels to known genomic annotation features and measures of evolutionary constraint.ResultsIndel rates are observed to be reduced approximately 20-fold to 60-fold in exonic regions, 5-fold to 10-fold in sequence that exhibits high evolutionary constraint in mammals, and up to 2-fold in some classes of regulatory elements (for instance, formaldehyde assisted isolation of regulatory elements [FAIRE] and hypersensitive sites). In addition, some noncoding transcription and other chromatin mediated regulatory sites also have reduced indel rates. Overall indel rates for these data are estimated to be smaller than single nucleotide polymorphism (SNP) rates by a factor of approximately 2, with both rates measured as base pairs per 100 kilobases to facilitate comparison.ConclusionIndel rates exhibit a broadly similar distribution across genomic features compared with SNP density rates, with a reduction in rates in coding transcription and evolutionarily constrained sequence. However, unlike indels, SNP rates do not appear to be reduced in some noncoding functional sequences, such as pseudo-exons, and FAIRE and hypersensitive sites. We conclude that indel rates are greatly reduced in transcribed and evolutionarily constrained DNA, and discuss why indel (but not SNP) rates appear to be constrained at some regulatory sites.

[1]  R. Hardison Conserved noncoding sequences are reliable guides to regulatory elements. , 2000, Trends in genetics : TIG.

[2]  L. Maquat Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics , 2004, Nature Reviews Molecular Cell Biology.

[3]  D. Conrad,et al.  A high-resolution survey of deletion polymorphism in the human genome , 2006, Nature Genetics.

[4]  E. Eichler,et al.  Primate segmental duplications: crucibles of evolution, diversity and disease , 2006, Nature Reviews Genetics.

[5]  Ryan E. Mills,et al.  An initial map of insertion and deletion (INDEL) variation in the human genome. , 2006, Genome research.

[6]  Lior Pachter,et al.  Phylogenetic Profiling of Insertions and Deletions in Vertebrate Genomes , 2006, RECOMB.

[7]  Wen-Hsiung Li,et al.  News & Views: The chimpanzee and us , 2005, Nature.

[8]  R. Guigó,et al.  EGASP: Introduction , 2006, Genome Biology.

[9]  Jeremy Heil,et al.  Human diallelic insertion/deletion polymorphisms. , 2002, American journal of human genetics.

[10]  Wen-Hsiung Li,et al.  News and views: the chimpanzee and us. , 2005, Nature.

[11]  V. Iyer,et al.  FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. , 2007, Genome research.

[12]  E. Birney,et al.  EGASP: the human ENCODE Genome Annotation Assessment Project , 2006, Genome Biology.

[13]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[14]  Colin N. Dewey,et al.  Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. , 2007, Genome research.

[15]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[16]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[17]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[18]  M. Hattori,et al.  Comparative analysis of chimpanzee and human Y chromosomes unveils complex evolutionary pathway , 2006, Nature Genetics.

[19]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[20]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[21]  Deborah A Nickerson,et al.  Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate genes. , 2005, Human molecular genetics.

[22]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[23]  H. Ellegren,et al.  Mutation rate variation in the mammalian genome. , 2003, Current opinion in genetics & development.

[24]  Chris P. Ponting,et al.  Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model , 2005, PLoS Comput. Biol..

[25]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[26]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[27]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[28]  Francisco E. Baralle,et al.  Genomic variants in exons and introns: identifying the splicing spoilers , 2004, Nature Reviews Genetics.

[29]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[30]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[31]  Carolyn J. Brown,et al.  A comprehensive analysis of common copy-number variations in the human genome. , 2007, American journal of human genetics.

[32]  P. Giresi,et al.  Regulation of nucleosome stability as a mediator of chromatin function. , 2006, Current opinion in genetics & development.

[33]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[34]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[35]  Matthew Stephens,et al.  Automating resequencing-based detection of insertion-deletion polymorphisms , 2006, Nature Genetics.

[36]  E. Eichler,et al.  Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. , 2006, American journal of human genetics.