Statistical Inference in the Wright–Fisher Model Using Allele Frequency Data

Abstract The Wright‐Fisher model provides an elegant mathematical framework for understanding allele frequency data. In particular, the model can be used to infer the demographic history of species and identify loci under selection. A crucial quantity for inference under the Wright‐Fisher model is the distribution of allele frequencies (DAF). Despite the apparent simplicity of the model, the calculation of the DAF is challenging. We review and discuss strategies for approximating the DAF, and how these are used in methods that perform inference from allele frequency data. Various evolutionary forces can be incorporated in the Wright‐Fisher model, and we consider these in turn. We begin our review with the basic bi‐allelic Wright‐Fisher model where random genetic drift is the only evolutionary force. We then consider mutation, migration, and selection. In particular, we compare diffusion‐based and moment‐based methods in terms of accuracy, computational efficiency, and analytical tractability. We conclude with a brief overview of the multi‐allelic process with a general mutation model.

[1]  M Kimura,et al.  SOLUTION OF A PROCESS OF RANDOM GENETIC DRIFT WITH A CONTINUOUS MODEL. , 1955, Proceedings of the National Academy of Sciences of the United States of America.

[2]  T. Hocking,et al.  A Bayesian Outlier Criterion to Detect SNPs under Selection in Large Data Sets , 2010, PloS one.

[3]  J. Schraiber A path integral formulation of the Wright-Fisher process with genic selection. , 2013, Theoretical population biology.

[4]  C. J-F,et al.  THE COALESCENT , 1980 .

[5]  Yun S. Song,et al.  Transition Densities and Sample Frequency Spectra of Diffusion Processes with Selection and Variable Population Size , 2015, Genetics.

[6]  Daniel R. Caffrey,et al.  Influenza Virus Drug Resistance: A Time-Sampled Population Genetics Perspective , 2014, PLoS genetics.

[7]  Sewall Wright,et al.  Statistical genetics and evolution , 1942 .

[8]  L. Cavalli-Sforza,et al.  PHYLOGENETIC ANALYSIS: MODELS AND ESTIMATION PROCEDURES , 1967, Evolution; international journal of organic evolution.

[9]  Ryan J. Haasl,et al.  Fifteen years of genomewide scans for selection: trends, lessons and unaddressed genetic sources of complication , 2016, Molecular ecology.

[10]  S. Lessard,et al.  Fixation Probability in a Two-Locus Model by the Ancestral Recombination–Selection Graph , 2012, Genetics.

[11]  M. Kimura Stochastic processes and distribution of gene frequencies under natural selection. , 1955, Cold Spring Harbor symposia on quantitative biology.

[12]  Mary K Kuhner,et al.  Coalescent genealogy samplers: windows into population history. , 2009, Trends in ecology & evolution.

[13]  Motoo Kimura,et al.  Some Genetic Problems in Natural Populations , 1956 .

[14]  David J. Balding,et al.  Weight-of-Evidence for Forensic DNA Profiles: Balding/Weight-of-Evidence for Forensic DNA Profiles , 2015 .

[15]  Christian Schlötterer,et al.  Multi-locus Analysis of Genomic Time Series Data from Experimental Evolution , 2014, bioRxiv.

[16]  J. Corander,et al.  Reconstructing population histories from single nucleotide polymorphism data. , 2011, Molecular biology and evolution.

[17]  Madhuri S. Mulekar,et al.  Weight-of Evidence for Forensic DNA Profiles , 2008, Technometrics.

[18]  O. Gaggiotti,et al.  Quantifying population structure using the F‐model , 2010, Molecular ecology resources.

[19]  Mark A Beaumont,et al.  Detecting and Measuring Selection from Gene Frequency Data , 2013, Genetics.

[20]  Peter Donnelly,et al.  The transient behaviour of the Moran model in population genetics , 1984, Mathematical Proceedings of the Cambridge Philosophical Society.

[21]  R. Durbin,et al.  Inference of human population history from individual whole-genome sequences. , 2011, Nature.

[22]  J. Crow Random Mating with Linkage in Polysomics , 1954, The American Naturalist.

[23]  Z. Yang,et al.  Probability models for DNA sequence evolution , 2004, Heredity.

[24]  W. Ewens Mathematical Population Genetics , 1980 .

[25]  Anand Bhaskar,et al.  A NOVEL SPECTRAL METHOD FOR INFERRING GENERAL DIPLOID SELECTION FROM TIME SERIES GENETIC DATA. , 2013, The annals of applied statistics.

[26]  Yun S. Song,et al.  Multi-locus match probability in a finite population: a fundamental difference between the Moran and Wright–Fisher models , 2009, Bioinform..

[27]  Yun S. Song,et al.  A Simple Method for Finding Explicit Analytic Transition Densities of Diffusion Processes with General Diploid Selection , 2012, Genetics.

[28]  Ryan D. Hernandez,et al.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data , 2009, PLoS genetics.

[29]  Wen-Hsiung Li,et al.  Coalescing into the 21st century: An overview and prospects of coalescent theory. , 1999, Theoretical population biology.

[30]  Scott V Edwards,et al.  Implementing and testing the multispecies coalescent model: A valuable paradigm for phylogenomics. , 2016, Molecular phylogenetics and evolution.

[31]  M. Blum Approximate Bayesian Computation: A Nonparametric Perspective , 2009, 0904.0635.

[32]  Samuel Karlin,et al.  A First Course on Stochastic Processes , 1968 .

[33]  Richard A. Nichols,et al.  A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity , 2008, Genetica.

[34]  J. Wang,et al.  A pseudo-likelihood method for estimating effective population size from temporally spaced samples. , 2001, Genetical research.

[35]  Paul Marjoram,et al.  Statistical Applications in Genetics and Molecular Biology Approximately Sufficient Statistics and Bayesian Computation , 2011 .

[36]  R. Punnett,et al.  The Genetical Theory of Natural Selection , 1930, Nature.

[37]  John Wakeley,et al.  The limits of theoretical population genetics. , 2005, Genetics.

[38]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[39]  A. Hobolth,et al.  The multivariate Wright-Fisher process with mutation: Moment-based analysis and inference using a hierarchical Beta model. , 2016, Theoretical population biology.

[40]  David B. Witonsky,et al.  Using Environmental Correlations to Identify Loci Underlying Local Adaptation , 2010, Genetics.

[41]  Gil McVean,et al.  Estimating Selection Coefficients in Spatially Structured Populations from Time Series Data of Allele Frequencies , 2013, Genetics.

[42]  S. Wright,et al.  THE DISTRIBUTION OF GENE FREQUENCIES IN POPULATIONS. , 1937, Science.

[43]  Xingye Yue,et al.  Complete Numerical Solution of the Diffusion Equation of Random Genetic Drift , 2013, Genetics.

[44]  Motoo Kimura,et al.  Some Problems of Stochastic Processes in Genetics , 1957 .

[45]  J. Kingman Origins of the coalescent. 1974-1982. , 2000, Genetics.

[46]  Matthew D. Rasmussen,et al.  Genome-Wide Inference of Ancestral Recombination Graphs , 2013, PLoS genetics.

[47]  C. Seoighe,et al.  Population Genetics Inference for Longitudinally-Sampled Mutants Under Strong Selection , 2014, Genetics.

[48]  Yun S. Song,et al.  Estimating Variable Effective Population Sizes from Multiple Genomes: A Sequentially Markov Conditional Sampling Distribution Approach , 2013, Genetics.

[49]  D. Balding,et al.  Significant genetic correlations among Caucasians at forensic DNA loci , 1997, Heredity.

[50]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[51]  O. Gaggiotti,et al.  A Genome-Scan Method to Identify Selected Loci Appropriate for Both Dominant and Codominant Markers: A Bayesian Perspective , 2008, Genetics.

[52]  T. N. Sriram Asymptotics in Statistics–Some Basic Concepts , 2002 .

[53]  S Wright,et al.  The Differential Equation of the Distribution of Gene Frequencies. , 1945, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Z. Gompert Bayesian inference of selection in a heterogeneous environment from genetic time‐series data , 2016, Molecular ecology.

[55]  W. Ewens The sampling theory of selectively neutral alleles. , 1972, Theoretical population biology.

[56]  J. S. Chang,et al.  A practical difference scheme for Fokker-Planck equations☆ , 1970 .

[57]  Cleve B. Moler,et al.  Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later , 1978, SIAM Rev..

[58]  Daniel Wegmann,et al.  An Approximate Markov Model for the Wright–Fisher Diffusion and Its Application to Time Series Data , 2015, Genetics.

[59]  S. Wright,et al.  The Distribution of Gene Frequencies Under Irreversible Mutation. , 1938, Proceedings of the National Academy of Sciences of the United States of America.

[60]  J. Corander,et al.  Inference on population histories by approximating infinite alleles diffusion. , 2013, Molecular biology and evolution.

[61]  Peter Donnelly,et al.  Assessing population differentiation and isolation from single‐nucleotide polymorphism data , 2002 .

[62]  Orestis Malaspinas,et al.  Estimating Allele Age and Selection Coefficient from Time-Serial Data , 2012, Genetics.

[63]  A. Kolmogoroff Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung , 1931 .

[64]  Christian Schlötterer,et al.  Linking Great Apes Genome Evolution across Time Scales Using Polymorphism-Aware Phylogenetic Models , 2013, Molecular biology and evolution.

[65]  P. A. P. Moran,et al.  Random processes in genetics , 1958, Mathematical Proceedings of the Cambridge Philosophical Society.

[66]  A. Burt,et al.  Estimating Effective Population Size from Temporally Spaced Samples with a Novel, Efficient Maximum-Likelihood Algorithm , 2015, Genetics.

[67]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[68]  A. Hobolth,et al.  Inference Under a Wright-Fisher Model Using an Accurate Beta Approximation , 2015, Genetics.

[69]  Nicola De Maio,et al.  PoMo: An Allele Frequency-Based Approach for Species Tree Estimation , 2015, bioRxiv.

[70]  M. Gautier Genome-Wide Scan for Adaptive Divergence and Association with Population-Specific Covariates , 2015, Genetics.

[71]  Tina T. Hu,et al.  A Genomic Map of the Effects of Linked Selection in Drosophila , 2014, PLoS genetics.

[72]  G. McVean,et al.  An introduction to population genetics , 2022 .

[73]  L. Excoffier,et al.  Robust Demographic Inference from Genomic and SNP Data , 2013, PLoS genetics.

[74]  M. Gautier,et al.  Inferring population histories using genome-wide allele frequency data. , 2013, Molecular biology and evolution.

[75]  Y. Svirezhev,et al.  Diffusion Models of Population Genetics , 1990 .

[76]  J. Wakeley Coalescent Theory: An Introduction , 2008 .

[77]  Jukka Sirén,et al.  Statistical models for inferring the structure and history of populations from genetic data , 2012 .

[78]  Anders E. Halager,et al.  A New Isolation with Migration Model along Complete Genomes Infers Very Different Divergence Processes among Closely Related Great Ape Species , 2012, PLoS genetics.

[79]  William Feller,et al.  Diffusion Processes in Genetics , 1951 .

[80]  M. Kimura,et al.  An introduction to population genetics theory , 1971 .

[81]  M. Slatkin,et al.  Using maximum likelihood to estimate population size from temporal changes in allele frequencies. , 1999, Genetics.

[82]  Joshua S. Paul,et al.  An Accurate Sequentially Markov Conditional Sampling Distribution for the Coalescent With Recombination , 2011, Genetics.

[83]  Jonathan P. Bollback,et al.  Estimation of 2Nes From Temporal Allele Frequency Data , 2008, Genetics.

[84]  Joseph K. Pickrell,et al.  Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data , 2012, PLoS genetics.

[85]  R. Durbin,et al.  Inferring human population size and separation history from multiple genome sequences , 2014, Nature Genetics.

[86]  M. Whitlock,et al.  Estimating effective population size and migration rates from genetic samples over space and time. , 2003, Genetics.

[87]  A. Hobolth,et al.  Genomic Relationships and Speciation Times of Human, Chimpanzee, and Gorilla Inferred from a Coalescent Hidden Markov Model , 2006, PLoS genetics.

[88]  J. Crow,et al.  Population genetics history: a personal view. , 1987, Annual review of genetics.

[89]  John Wakeley,et al.  Modeling Multiallelic Selection Using a Moran Model , 2009, Genetics.

[90]  Paul A. Jenkins,et al.  TRACTABLE DIFFUSION AND COALESCENT PROCESSES FOR WEAKLY CORRELATED LOCI. , 2014, Electronic journal of probability.

[91]  M. Slatkin,et al.  An Introduction to Population Genetics: Theory and Applications , 2013 .

[92]  N. Barton,et al.  Evolution of Recombination Due to Random Drift , 2005, Genetics.

[93]  Scott V Edwards,et al.  Coalescent methods for estimating phylogenetic trees. , 2009, Molecular phylogenetics and evolution.

[94]  S. Wright,et al.  Evolution in Mendelian Populations. , 1931, Genetics.

[95]  Samuel Karlin,et al.  ELEMENTS OF STOCHASTIC PROCESSES , 1975 .

[96]  Yun-Xin Fu,et al.  Exploring Population Size Changes Using SNP Frequency Spectra , 2015, Nature Genetics.

[97]  Claus Vogl,et al.  The allele-frequency spectrum in a decoupled Moran model with mutation, drift, and directional selection, assuming small mutation rates , 2012, Theoretical population biology.

[98]  Nicola De Maio,et al.  PoMo: An Allele Frequency-based Approach for Species Tree Estimation , 2015 .

[99]  Sergio Lukić,et al.  Demographic Inference Using Spectral Methods on SNP Data, with an Analysis of the Human Out-of-Africa Expansion , 2012, Genetics.