Simultaneous inference of selection and population growth from patterns of variation in the human genome

Natural selection and demographic forces can have similar effects on patterns of DNA polymorphism. Therefore, to infer selection from samples of DNA sequences, one must simultaneously account for demographic effects. Here we take a model-based approach to this problem by developing predictions for patterns of polymorphism in the presence of both population size change and natural selection. If data are available from different functional classes of variation, and a priori information suggests that mutations in one of those classes are selectively neutral, then the putatively neutral class can be used to infer demographic parameters, and inferences regarding selection on other classes can be performed given demographic parameter estimates. This procedure is more robust to assumptions regarding the true underlying demography than previous approaches to detecting and analyzing selection. We apply this method to a large polymorphism data set from 301 human genes and find (i) widespread negative selection acting on standing nonsynonymous variation, (ii) that the fitness effects of nonsynonymous mutations are well predicted by several measures of amino acid exchangeability, especially site-specific methods, and (iii) strong evidence for very recent population growth.

[1]  Sarah Mae Sincero Heredity , 1875, Nature.

[2]  S. Wright,et al.  The Distribution of Gene Frequencies Under Irreversible Mutation. , 1938, Proceedings of the National Academy of Sciences of the United States of America.

[3]  M Kimura,et al.  SOLUTION OF A PROCESS OF RANDOM GENETIC DRIFT WITH A CONTINUOUS MODEL. , 1955, Proceedings of the National Academy of Sciences of the United States of America.

[4]  M. Kimura Stochastic processes and distribution of gene frequencies under natural selection. , 1955, Cold Spring Harbor symposia on quantitative biology.

[5]  Motoo Kimura,et al.  Diffusion models in population genetics , 1964, Journal of Applied Probability.

[6]  W. G. Hill,et al.  The effect of linkage on limits to artificial selection. , 1966, Genetical research.

[7]  M. Kimura,et al.  An introduction to population genetics theory , 1971 .

[8]  T. Ohta Slightly Deleterious Mutant Substitutions in Evolution , 1973, Nature.

[9]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[10]  J. Crow,et al.  Mutations affecting fitness in Drosophila populations. , 1977, Annual review of genetics.

[11]  B. Bainbridge,et al.  Genetics , 1981, Experientia.

[12]  N L Kaplan,et al.  The coalescent process in models with selection. , 1988, Genetics.

[13]  M. Kreitman,et al.  Adaptive protein evolution at the Adh locus in Drosophila , 1991, Nature.

[14]  M. Slatkin,et al.  Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. , 1991, Genetics.

[15]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[16]  D. Hartl,et al.  Population genetics of polymorphism and divergence. , 1992, Genetics.

[17]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[18]  R. Jernigan,et al.  A new substitution matrix for protein sequence searches based on contact frequencies in protein structures. , 1993, Protein engineering.

[19]  S. Tavaré,et al.  Sampling theory for neutral alleles in a varying environment. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[20]  D. Hartl,et al.  Selection intensity for codon bias. , 1994, Genetics.

[21]  L. Kann,et al.  Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes from Drosophila, mice, and humans. , 1996, Molecular biology and evolution.

[22]  S. Schaeffer,et al.  Natural selection and the frequency distributions of "silent" DNA polymorphism in Drosophila. , 1997, Genetics.

[23]  L. Brooks,et al.  A DNA polymorphism discovery resource for research on human genetic variation. , 1998, Genome research.

[24]  H. Akashi,et al.  Inferring the fitness effects of DNA mutations from polymorphism and divergence data: statistical power to detect directional selection under stationarity and free recombination. , 1999, Genetics.

[25]  A. Chakravarti Population genetics—making sense out of sequence , 1999, Nature Genetics.

[26]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[27]  P Bork,et al.  SNP frequencies in human genes an excess of rare alleles and differing modes of selection. , 2000, Trends in genetics : TIG.

[28]  D. Hartl,et al.  Directional selection and the site-frequency spectrum. , 2001, Genetics.

[29]  R. Nielsen Statistical tests of selective neutrality in the age of genomics , 2001, Heredity.

[30]  E. Lander,et al.  On the allelic spectrum of human disease. , 2001, Trends in genetics : TIG.

[31]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[32]  J. Pritchard,et al.  The allelic architecture of human disease genes: common disease-common variant...or not? , 2002, Human molecular genetics.

[33]  Carlos D. Bustamante,et al.  The cost of inbreeding in Arabidopsis , 2002, Nature.

[34]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[35]  William H. Press,et al.  Numerical recipes in C , 2002 .

[36]  Adam Eyre-Walker,et al.  Adaptive protein evolution in Drosophila , 2002, Nature.

[37]  Molly Przeworski,et al.  Evidence for population growth in humans is confounded by fine-scale population structure. , 2002, Trends in genetics : TIG.

[38]  Kevin Thornton,et al.  libsequence: a C++ class library for evolutionary genetic analysis , 2003, Bioinform..

[39]  Andrew G. Clark,et al.  Reconstituting the Frequency Spectrum of Ascertained Single-Nucleotide Polymorphism Data , 2004, Genetics.

[40]  Sivakumar Gowrisankar,et al.  Pattern of sequence variation across 213 environmental response genes. , 2004, Genome research.

[41]  C. Bustamante,et al.  Population Genetics of Polymorphism and Divergence for Diploid Selection Models With Arbitrary Dominance , 2004, Genetics.

[42]  Gabor T. Marth,et al.  The Allele Frequency Spectrum in Genome-Wide Human Variation Data Reveals Signals of Differential Demographic History in Three Large World Populations , 2004, Genetics.

[43]  S. Miyazawa,et al.  Two types of amino acid substitutions in protein evolution , 1979, Journal of Molecular Evolution.