Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants

Targeted capture combined with massively parallel exome sequencing is a promising approach to identify genetic variants implicated in human traits. We report exome sequencing of 200 individuals from Denmark with targeted capture of 18,654 coding genes and sequence coverage of each individual exome at an average depth of 12-fold. On average, about 95% of the target regions were covered by at least one read. We identified 121,870 SNPs in the sample population, including 53,081 coding SNPs (cSNPs). Using a statistical method for SNP calling and an estimation of allelic frequencies based on our population data, we derived the allele frequency spectrum of cSNPs with a minor allele frequency greater than 0.02. We identified a 1.8-fold excess of deleterious, non-syonomyous cSNPs over synonymous cSNPs in the low-frequency range (minor allele frequencies between 2% and 5%). This excess was more pronounced for X-linked SNPs, suggesting that deleterious substitutions are primarily recessive.

[1]  Chao Qian,et al.  Population , 1940, State Rankings 2020: A Statistical View of America.

[2]  W. G. Hill,et al.  The effect of linkage on limits to artificial selection. , 1966, Genetical research.

[3]  D. Hartl,et al.  Population genetics of polymorphism and divergence. , 1992, Genetics.

[4]  M. Nachman,et al.  Estimate of the mutation rate per nucleotide in humans. , 2000, Genetics.

[5]  P Bork,et al.  SNP frequencies in human genes an excess of rare alleles and differing modes of selection. , 2000, Trends in genetics : TIG.

[6]  Conrad C. Huang,et al.  Natural variation in human membrane transporter genes reveals evolutionary and functional constraints , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  L. Duret,et al.  Recombination drives the evolution of GC-content in the human genome. , 2004, Molecular biology and evolution.

[8]  M. Hammer,et al.  Heterogeneous Patterns of Variation Among Multiple Human X-Linked Loci , 2004, Genetics.

[9]  Ryan D. Hernandez,et al.  Natural selection on protein-coding genes in the human genome , 2005, Nature.

[10]  Ryan D. Hernandez,et al.  Simultaneous inference of selection and population growth from patterns of variation in the human genome , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  B. Charlesworth,et al.  Evolution on the X chromosome: unusual patterns and processes , 2006, Nature Reviews Genetics.

[12]  Philip L. F. Johnson,et al.  Inference of population genetic parameters in metagenomics: a clean look at messy data. , 2006, Genome research.

[13]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[14]  G. Weinstock,et al.  Direct selection of human genomic loci by microarray hybridization , 2007, Nature Methods.

[15]  P. Keightley,et al.  Joint Inference of the Distribution of Fitness Effects of Deleterious Mutations and Population Demography Based on Nucleotide Polymorphism Frequencies , 2007, Genetics.

[16]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[17]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[18]  Philip L. F. Johnson,et al.  Accounting for bias from sequencing error in population genetic estimates. , 2007, Molecular biology and evolution.

[19]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[20]  Michael Lynch,et al.  Estimation of nucleotide diversity, disequilibrium coefficients, and mutation rates from high-coverage genome-sequencing projects. , 2008, Molecular biology and evolution.

[21]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[22]  Ryan D. Hernandez,et al.  Assessing the Evolutionary Impact of Amino Acid Mutations in the Human Genome , 2008, PLoS genetics.

[23]  Andrew G. Clark,et al.  Darwinian and demographic forces affecting human protein coding genes. , 2009, Genome research.

[24]  I. Tikhonova,et al.  Genetic diagnosis by whole exome capture and massively parallel DNA sequencing , 2009, Proceedings of the National Academy of Sciences.

[25]  Michael Lynch,et al.  Estimation of Allele Frequencies From High-Coverage Genome-Sequencing Projects , 2009, Genetics.

[26]  Huanming Yang,et al.  SNP detection for massively parallel whole-genome resequencing. , 2009, Genome research.

[27]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[28]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.