The effect of recurrent mutation on the frequency spectrum of a segregating site and the age of an allele.

The sample frequency spectrum of a segregating site is the probability distribution of a sample of alleles from a genetic locus, conditional on observing the sample to be polymorphic. This distribution is widely used in population genetic inferences, including statistical tests of neutrality in which a skew in the observed frequency spectrum across independent sites is taken as a signature of departure from neutral evolution. Theoretical aspects of the frequency spectrum have been well studied and several interesting results are available, but they are usually under the assumption that a site has undergone at most one mutation event in the history of the sample. Here, we extend previous theoretical results by allowing for at most two mutation events per site, under a general finite allele model in which the mutation rate is independent of current allelic state but the transition matrix is otherwise completely arbitrary. Our results apply to both nested and nonnested mutations. Only the former has been addressed previously, whereas here we show it is the latter that is more likely to be observed except for very small sample sizes. Further, for any mutation transition matrix, we obtain the joint sample frequency spectrum of the two mutant alleles at a triallelic site, and derive a closed-form formula for the expected age of the younger of the two mutations given their frequencies in the population. Several large-scale resequencing projects for various species are presently under way and the resulting data will include some triallelic polymorphisms. The theoretical results described in this paper should prove useful in population genomic analyses of such data.

[1]  C. J-F,et al.  THE COALESCENT , 1980 .

[2]  G. A. Watterson On the number of segregating sites in genetical models without recombination. , 1975, Theoretical population biology.

[3]  D. Hartl,et al.  Population genetics of polymorphism and divergence. , 1992, Genetics.

[4]  R. Griffiths,et al.  The frequency spectrum of a mutation, and its age, in a general diffusion model. , 2003, Theoretical population biology.

[5]  T. Ohta,et al.  The age of a neutral mutant persisting in a finite population. , 1973, Genetics.

[6]  S. Tavaré,et al.  Line-of-descent and genealogical processes, and their applications in population genetics models. , 1984, Theoretical population biology.

[7]  M. Kimmel,et al.  New explicit expressions for relative frequencies of single-nucleotide polymorphisms with application to statistical inference on population growth. , 2003, Genetics.

[8]  E. Mayr Adaptation and selection , 1981 .

[9]  Michael M. Desai,et al.  The Polymorphism Frequency Spectrum of Finitely Many Sites Under Selection , 2008, Genetics.

[10]  G. Simpson,et al.  Genetics, paleontology, and evolution. , 1949 .

[11]  Steven Roman,et al.  The Harmonic Logarithms and the Binomial Formula , 1993, J. Comb. Theory, Ser. A.

[12]  D. Hartl,et al.  Directional selection and the site-frequency spectrum. , 2001, Genetics.

[13]  P. Donnelly,et al.  Conditional genealogies and the age of a neutral mutant. , 1999, Theoretical population biology.

[14]  G. Achaz Frequency Spectrum Neutrality Tests: One for All and All for One , 2009, Genetics.

[15]  Y. Fu,et al.  Statistical properties of segregating sites. , 1995, Theoretical population biology.

[16]  Peter Green,et al.  Highly Structured Stochastic Systems , 2003 .

[17]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[18]  A. Hobolth,et al.  The genealogy, site frequency spectrum and ages of two nested mutant alleles. , 2009, Theoretical population biology.

[19]  Philip L. F. Johnson,et al.  Inference of population genetic parameters in metagenomics: a clean look at messy data. , 2006, Genome research.

[20]  C. Simulating Probability Distributions in the Coalescent * , 2022 .

[21]  S. Tavaré,et al.  The age of a mutation in a general coalescent tree , 1998 .

[22]  S. Tavaré,et al.  The genealogy of a neutral mutation , 2003 .

[23]  Alan Hodgkinson,et al.  Human Triallelic Sites: Evidence for a New Mutational Mechanism? , 2010, Genetics.

[24]  Steven N Evans,et al.  Non-equilibrium theory of the allele frequency spectrum. , 2006, Theoretical population biology.

[25]  C. E. Pearson,et al.  Table S2: Trans-factors and trinucleotide repeat instability Trans-factor , 2010 .

[26]  Simon Tavaré,et al.  Lines-of-descent and genealogical processes, and their applications in population genetics models , 1984, Advances in Applied Probability.

[27]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.