Deep resequencing reveals excess rare recent variants consistent with explosive population growth

Accurately determining the distribution of rare variants is an important goal of human genetics, but resequencing of a sample large enough for this purpose has been unfeasible until now. Here, we applied Sanger sequencing of genomic PCR amplicons to resequence the diabetes-associated genes KCNJ11 and HHEX in 13,715 people (10,422 European Americans and 3,293 African Americans) and validated amplicons potentially harbouring rare variants using 454 pyrosequencing. We observed far more variation (expected variant-site count ∼578) than would have been predicted on the basis of earlier surveys, which could only capture the distribution of common variants. By comparison with earlier estimates based on common variants, our model shows a clear genetic signal of accelerating population growth, suggesting that humanity harbours a myriad of rare, deleterious variants, and that disease risk and the burden of disease in contemporary populations may be heavily influenced by the distribution of rare variants.

[1]  D. Hartl,et al.  Principles of population genetics , 1981 .

[2]  M. Livi-bacci,et al.  A Concise History of World Population , 1994 .

[3]  W. D. Borrie,et al.  A Concise History of World Population. , 1994 .

[4]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[5]  David Haussler,et al.  Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology , 1996, Comput. Appl. Biosci..

[6]  D. Nickerson,et al.  PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. , 1997, Nucleic acids research.

[7]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[8]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[9]  M. Bacci The Population of Europe , 2000 .

[10]  M. Nachman,et al.  Estimate of the mutation rate per nucleotide in humans. , 2000, Genetics.

[11]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[12]  J. Pritchard Are rare variants responsible for susceptibility to complex diseases? , 2001, American journal of human genetics.

[13]  N. Freimer,et al.  Screening a large reference sample to identify very low frequency sequence variants: comparisons between two genes , 2001, Nature Genetics.

[14]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[15]  John Wakeley,et al.  Gene genealogies when the sample size exceeds the effective size of the population. , 2003, Molecular biology and evolution.

[16]  A Population of , 2004 .

[17]  R. Gibbs,et al.  SNPdetector: A Software Tool for Sensitive and Accurate SNP Detection , 2005, PLoS Comput. Biol..

[18]  T. Ohta Very slightly deleterious mutations and the molecular clock , 2005, Journal of Molecular Evolution.

[19]  Jonathan C. Cohen,et al.  Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. , 2006, The New England journal of medicine.

[20]  Paul Scheet,et al.  Automating sequence-based detection and genotyping of SNPs from diploid samples , 2006, Nature Genetics.

[21]  Torsten Schwede,et al.  The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling , 2006, Bioinform..

[22]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[23]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[24]  Timothy B. Stockwell,et al.  Genetic Variation in an Individual Human Exome , 2008, PLoS genetics.

[25]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[26]  Ryan D. Hernandez,et al.  Assessing the Evolutionary Impact of Amino Acid Mutations in the Human Genome , 2008, PLoS genetics.

[27]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[28]  M. McCarthy,et al.  Detailed Investigation of the Role of Common and Low-Frequency WFS1 Variants in Type 2 Diabetes Risk , 2009, Diabetes.

[29]  Ryan D. Hernandez,et al.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data , 2009, PLoS genetics.

[30]  V. Salomaa,et al.  Excess of rare variants in genes identified by genome-wide association study of hypertriglyceridemia , 2010, Nature Genetics.

[31]  M. Lynch Rate, molecular spectrum, and consequences of human mutation , 2010, Proceedings of the National Academy of Sciences.

[32]  A. Eyre-Walker Evolution in health and medicine Sackler colloquium: Genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. , 2010, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.