Bayesian Inference of Natural Selection from Allele Frequency Time Series

The advent of accessible ancient DNA technology now allows the direct ascertainment of allele frequencies in ancestral populations, thereby enabling the use of allele frequency time series to detect and estimate natural selection. Such direct observations of allele frequency dynamics are expected to be more powerful than inferences made using patterns of linked neutral variation obtained from modern individuals. We developed a Bayesian method to make use of allele frequency time series data and infer the parameters of general diploid selection, along with allele age, in nonequilibrium populations. We introduce a novel path augmentation approach, in which we use Markov chain Monte Carlo to integrate over the space of allele frequency trajectories consistent with the observed data. Using simulations, we show that this approach has good power to estimate selection coefficients and allele age. Moreover, when applying our approach to data on horse coat color, we find that ignoring a relevant demographic history can significantly bias the results of inference. Our approach is made available in a C++ software package.

[1]  J. Pritchard,et al.  A Map of Recent Positive Selection in the Human Genome , 2006, PLoS biology.

[2]  Orestis Malaspinas,et al.  Estimating Allele Age and Selection Coefficient from Time-Serial Data , 2012, Genetics.

[3]  Graham Coop,et al.  Ancestral inference on gene trees under selection. , 2004, Theoretical population biology.

[4]  S. Tavaré,et al.  Sampling theory for neutral alleles in a varying environment. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[5]  Christiane Fuchs,et al.  Inference for Diffusion Processes , 2013 .

[6]  W. Ewens Mathematical Population Genetics , 1980 .

[7]  D. Reich,et al.  Genome-wide patterns of selection in 230 ancient Eurasians , 2015, Nature.

[8]  F. Knight Essentials of Brownian Motion and Diffusion , 1981 .

[9]  J. Schraiber A path integral formulation of the Wright-Fisher process with genic selection. , 2013, Theoretical population biology.

[10]  G. Winkler,et al.  The Stochastic Integral , 1990 .

[11]  Gil McVean,et al.  Estimating Selection Coefficients in Spatially Structured Populations from Time Series Data of Allele Frequencies , 2013, Genetics.

[12]  Paul Fearnhead,et al.  Markov Chain Monte Carlo for Exact Inference for Diffusions , 2011, 1102.5541.

[13]  M. Slatkin,et al.  Evolutionary Genomics and Conservation of the Endangered Przewalski’s Horse , 2015, Current Biology.

[14]  M Slatkin,et al.  Simulating genealogies of selected alleles in a population of variable size. , 2001, Genetical research.

[15]  G. Roberts,et al.  On inference for partially observed nonlinear diffusion models using the Metropolis–Hastings algorithm , 2001 .

[16]  S. Kudaravalli Recent positive selection in the human genome , 2008 .

[17]  D. Wilkinson,et al.  Bayesian Inference for Stochastic Kinetic Models Using a Diffusion Approximation , 2005, Biometrics.

[18]  M. Slatkin,et al.  Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. , 1991, Genetics.

[19]  I. V. Girsanov On Transforming a Certain Class of Stochastic Processes by Absolutely Continuous Substitution of Measures , 1960 .

[20]  M. Jakobsson,et al.  Assessing the maximum contribution from ancient populations. , 2014, Molecular biology and evolution.

[21]  Estimating and testing selection: the two-alleles, genic selection diffusion model , 1979 .

[22]  Anand Bhaskar,et al.  A NOVEL SPECTRAL METHOD FOR INFERRING GENERAL DIPLOID SELECTION FROM TIME SERIES GENETIC DATA. , 2013, The annals of applied statistics.

[23]  日本学士院 Proceedings of the Japan Academy. Ser. A, Mathematical sciences , 1977 .

[24]  M. Slatkin,et al.  Bayesian Inference of Natural Selection from Allele Frequency Time Series , 2016, Genetics.

[25]  Mandy J. Haldane,et al.  A Mathematical Theory of Natural and Artificial Selection, Part V: Selection and Mutation , 1927, Mathematical Proceedings of the Cambridge Philosophical Society.

[26]  R. Griffiths,et al.  Analysis and rejection sampling of Wright-Fisher diffusion bridges. , 2013, Theoretical population biology.

[27]  Yun S. Song,et al.  A Simple Method for Finding Explicit Analytic Transition Densities of Diffusion Processes with General Diploid Selection , 2012, Genetics.

[28]  Paul A. Jenkins,et al.  Exact simulation of the sample paths of a diffusion with a finite entrance boundary , 2013, 1311.5777.

[29]  Rory A. Fisher,et al.  XXI.—On the Dominance Ratio , 1923 .

[30]  M. Slatkin,et al.  Coat Color Variation at the Beginning of Horse Domestication , 2009, Science.

[31]  Carlos Bustamante,et al.  Genomic scans for selective sweeps using SNP data. , 2005, Genome research.

[32]  Franziska Wulf,et al.  Mathematical Population Genetics , 2016 .

[33]  Paul A. Jenkins,et al.  EXACT SIMULATION OF THE WRIGHT – FISHER DIFFUSION , 2017 .

[34]  W. Ewens Mathematical Population Genetics : I. Theoretical Introduction , 2004 .

[35]  Michael Sorensen,et al.  Parametric Inference for Discretely Sampled Stochastic Differential Equations , 2008 .

[36]  William Feller,et al.  Diffusion Processes in Genetics , 1951 .

[37]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[38]  R. A. Fisher,et al.  On the dominance ratio , 1990 .

[39]  S. Kryazhimskiy,et al.  Identifying Signatures of Selection in Genetic Time Series , 2018, Genetics.

[40]  Jonathan P. Bollback,et al.  Estimation of 2Nes From Temporal Allele Frequency Data , 2008, Genetics.

[41]  M. Slatkin,et al.  Inferring Selection Intensity and Allele Age from Multilocus Haplotype Structure , 2013, G3: Genes, Genomes, Genetics.

[42]  M. Slatkin,et al.  Using maximum likelihood to estimate population size from temporal changes in allele frequencies. , 1999, Genetics.

[43]  Christiane Fuchs,et al.  Inference for Diffusion Processes: With Applications in Life Sciences , 2013 .

[44]  Darren J. Wilkinson,et al.  Bayesian inference for nonlinear multivariate diffusion models observed with error , 2008, Comput. Stat. Data Anal..

[45]  Joseph K. Pickrell,et al.  Signals of recent positive selection in a worldwide sample of human populations. , 2009, Genome research.

[46]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .