Deleterious Alleles in the Human Genome Are on Average Younger Than Neutral Alleles of the Same Frequency

Large-scale population sequencing studies provide a complete picture of human genetic variation within the studied populations. A key challenge is to identify, among the myriad alleles, those variants that have an effect on molecular function, phenotypes, and reproductive fitness. Most non-neutral variation consists of deleterious alleles segregating at low population frequency due to incessant mutation. To date, studies characterizing selection against deleterious alleles have been based on allele frequency (testing for a relative excess of rare alleles) or ratio of polymorphism to divergence (testing for a relative increase in the number of polymorphic alleles). Here, starting from Maruyama's theoretical prediction (Maruyama T (1974), Am J Hum Genet USA 6:669–673) that a (slightly) deleterious allele is, on average, younger than a neutral allele segregating at the same frequency, we devised an approach to characterize selection based on allelic age. Unlike existing methods, it compares sets of neutral and deleterious sequence variants at the same allele frequency. When applied to human sequence data from the Genome of the Netherlands Project, our approach distinguishes low-frequency coding non-synonymous variants from synonymous and non-coding variants at the same allele frequency and discriminates between sets of variants independently predicted to be benign or damaging for protein structure and function. The results confirm the abundance of slightly deleterious coding variation in humans.

[1]  B. Rannala,et al.  High-resolution multipoint linkage-disequilibrium mapping in the context of a human genome sequence. , 2001, American journal of human genetics.

[2]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[3]  Shamil R Sunyaev,et al.  Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. , 2007, American journal of human genetics.

[4]  J. Stamatoyannopoulos,et al.  Power of deep, all-exon resequencing for discovery of human trait genes , 2009, Proceedings of the National Academy of Sciences.

[5]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[6]  M. Slatkin,et al.  Estimating the age of alleles by use of intraallelic variability. , 1997, American journal of human genetics.

[7]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[8]  J. Todd,et al.  Rare Variants of IFIH1, a Gene Implicated in Antiviral Responses, Protect Against Type 1 Diabetes , 2009, Science.

[9]  Ryan D. Hernandez,et al.  A flexible forward simulator for populations subject to selection and demography , 2008, Bioinform..

[10]  M Slatkin,et al.  Simulating genealogies of selected alleles in a population of variable size. , 2001, Genetical research.

[11]  Ryan D. Hernandez,et al.  Simultaneous inference of selection and population growth from patterns of variation in the human genome , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Adam Kiezun,et al.  Computational and statistical approaches to analyzing variants identified by exome sequencing , 2011, Genome Biology.

[13]  Paul Flicek,et al.  The functional spectrum of low-frequency coding variation , 2011, Genome Biology.

[14]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.

[15]  T. Maruyama,et al.  The age of a rare mutant gene in a large population. , 1974, American journal of human genetics.

[16]  References , 1971 .

[17]  Jonathan C. Cohen,et al.  Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. , 2006, The New England journal of medicine.

[18]  Justin C. Fay,et al.  Positive and negative selection on the human genome. , 2001, Genetics.

[19]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[20]  J. Akey,et al.  Distribution of recombination crossovers and the origin of haplotype blocks: the interplay of population history, recombination, and mutation. , 2002, American journal of human genetics.

[21]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[22]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[23]  A. Eyre-Walker,et al.  The Distribution of Fitness Effects of New Deleterious Amino Acid Mutations in Humans , 2006, Genetics.

[24]  Pardis C Sabeti,et al.  Detecting recent positive selection in the human genome from haplotype structure , 2002, Nature.

[25]  Pardis C Sabeti,et al.  Genome-wide detection and characterization of positive selection in human populations , 2007, Nature.

[26]  J. Pritchard,et al.  A Map of Recent Positive Selection in the Human Genome , 2006, PLoS biology.

[27]  E. Génin,et al.  Estimating the age of rare disease mutations: the example of Triple-A syndrome , 2004, Journal of Medical Genetics.

[28]  M. Kimura,et al.  Moments for sum of an arbitrary function of gene frequency along a stochastic path of gene frequency change. , 1975, Proceedings of the National Academy of Sciences of the United States of America.

[29]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[30]  Ryan D. Hernandez,et al.  Assessing the Evolutionary Impact of Amino Acid Mutations in the Human Genome , 2008, PLoS genetics.