Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes

A Deep Look Into Our Genes Recent debates have focused on the degree of genetic variation and its impact upon health at the genomic level in humans (see the Perspective by Casals and Bertranpetit). Tennessen et al. (p. 64, published online 17 May), looking at all of the protein-coding genes in the human genome, and Nelson et al. (p. 100, published online 17 May), looking at genes that encode drug targets, address this question through deep sequencing efforts on samples from multiple individuals. The findings suggest that most human variation is rare, not shared between populations, and that rare variants are likely to play a role in human health. Most functionally consequential variants in protein-coding genes are rare and, thus, difficult to find. As a first step toward understanding how rare variants contribute to risk for complex diseases, we sequenced 15,585 human protein-coding genes to an average median depth of 111× in 2440 individuals of European (n = 1351) and African (n = 1088) ancestry. We identified over 500,000 single-nucleotide variants (SNVs), the majority of which were rare (86% with a minor allele frequency less than 0.5%), previously unknown (82%), and population-specific (82%). On average, 2.3% of the 13,595 SNVs each person carried were predicted to affect protein function of ~313 genes per genome, and ~95.7% of SNVs predicted to be functionally important were rare. This excess of rare functional variants is due to the combined effects of explosive, recent accelerated population growth and weak purifying selection. Furthermore, we show that large sample sizes will be required to associate rare variants with complex traits.

[1]  M. Kimura Evolutionary Rate at the Molecular Level , 1968, Nature.

[2]  B. Bainbridge,et al.  Genetics , 1981, Experientia.

[3]  M. Kreitman,et al.  Adaptive protein evolution at the Adh locus in Drosophila , 1991, Nature.

[4]  AC Tose Cell , 1993, Cell.

[5]  Y. Fu,et al.  Statistical properties of segregating sites. , 1995, Theoretical population biology.

[6]  A Chakravarti,et al.  Patterns of genetic variation in Mendelian and complex traits. , 2000, Annual review of genomics and human genetics.

[7]  Deborah A Nickerson,et al.  Population History and Natural Selection Shape Patterns of Genetic Variation in 132 Genes , 2004, PLoS biology.

[8]  L. Wernisch,et al.  Solving the riddle of codon usage preferences: a test for translational selection. , 2004, Nucleic acids research.

[9]  F. Martinez,et al.  Automated high-throughput sex-typing assay. , 2004, BioTechniques.

[10]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[11]  宁北芳,et al.  疟原虫var基因转换速率变化导致抗原变异[英]/Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A , 2005 .

[12]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[13]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[14]  Ryan D. Hernandez,et al.  Proportionally more deleterious genetic variation in European than in African populations , 2008, Nature.

[15]  M. Feldman,et al.  Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation , 2008 .

[16]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[17]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[18]  R. Nielsen,et al.  Correcting Estimators of θ and Tajima's D for Ascertainment Biases Caused by the Single-Nucleotide Polymorphism Discovery Process , 2009, Genetics.

[19]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[20]  Joshua M Akey,et al.  Where do we go from here? Constructing genomic maps of positive selection in humans: , 2009 .

[21]  Justin C. Fay,et al.  Identification of deleterious mutations within three human genomes. , 2009, Genome research.

[22]  Ryan D. Hernandez,et al.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data , 2009, PLoS genetics.

[23]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[24]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[25]  J. Stamatoyannopoulos,et al.  Power of deep, all-exon resequencing for discovery of human trait genes , 2009, Proceedings of the National Academy of Sciences.

[26]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[27]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[28]  Jay Shendure,et al.  Single-nucleotide evolutionary constraint scores highlight disease-causing mutations , 2010, Nature Methods.

[29]  M. King,et al.  Genetic Heterogeneity in Human Disease , 2010, Cell.

[30]  Asan,et al.  Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude , 2010, Science.

[31]  Eleftheria Zeggini,et al.  Rare variant association analysis methods for complex traits. , 2010, Annual review of genetics.

[32]  Dennis C. Friedrich,et al.  A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries , 2011, Genome Biology.

[33]  Taylor J. Maxwell,et al.  Deep resequencing reveals excess rare recent variants consistent with explosive population growth , 2010, Nature communications.

[34]  J. Akey,et al.  Signatures of positive selection apparent in a small sample of human exomes. , 2010, Genome research.

[35]  Jana Marie Schwarz,et al.  MutationTaster evaluates disease-causing potential of sequence alterations , 2010, Nature Methods.

[36]  Josyf Mychaleckyj,et al.  Robust relationship inference in genome-wide association studies , 2010, Bioinform..

[37]  P. Shannon,et al.  Exome sequencing identifies the cause of a Mendelian disorder , 2009, Nature Genetics.

[38]  Roded Sharan,et al.  Translation efficiency in humans: tissue specificity, global optimization and differences between developmental stages , 2010, Nucleic acids research.

[39]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[40]  Elizabeth T. Cirulli,et al.  Whole-Genome Sequencing of a Single Proband Together with Linkage Analysis Identifies a Mendelian Disease Gene , 2010, PLoS genetics.

[41]  Emily H Turner,et al.  Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome , 2010, Nature Genetics.

[42]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[43]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[44]  Stephen C. J. Parker,et al.  Accurate and comprehensive sequencing of personal genomes. , 2011, Genome research.

[45]  G. Abecasis,et al.  Low-coverage sequencing: implications for design of complex trait association studies. , 2011, Genome research.

[46]  Paul Flicek,et al.  The functional spectrum of low-frequency coding variation , 2011, Genome Biology.

[47]  M. Rieder,et al.  Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations , 2011, Nature Genetics.

[48]  Gabor T. Marth,et al.  Demographic history and rare allele sharing among human populations , 2011, Proceedings of the National Academy of Sciences.

[49]  Greg Gibson,et al.  Rare and common variants: twenty arguments , 2012, Nature Reviews Genetics.