Phase-type distributions in population genetics

Probability modelling for DNA sequence evolution is well established and provides a rich framework for understanding genetic variation between samples of individuals from one or more populations. We show that both classical and more recent models for coalescence (with or without recombination) can be described in terms of the so-called phase-type theory, where complicated and tedious calculations are circumvented by the use of matrices. The application of phase-type theory consists of describing the stochastic model as a Markov model by appropriately setting up a state space and calculating the corresponding intensity and reward matrices. Formulae of interest are then expressed in terms of these aforementioned matrices. We illustrate this by a few examples calculating the mean, variance and even higher order moments of the site frequency spectrum in the multiple merger coalescent models, and by analysing the mean and variance for the number of segregating sites for multiple samples in the two-locus ancestral recombination graph. We believe that phase-type theory has great potential as a tool for analysing probability models in population genetics. The compact matrix notation is useful for clarification of current models, in particular their formal manipulation (calculation), but also for further development or extensions.

[1]  N. Kurt,et al.  A NEW COALESCENT FOR SEEDBANK MODELS By , 2020 .

[2]  G. Kersting,et al.  Tree lengths for general $\Lambda $-coalescents and the asymptotic site frequency spectrum around the Bolthausen–Sznitman coalescent , 2018, The Annals of Applied Probability.

[3]  Amaury Lambert,et al.  Trees within trees: simple nested coalescents , 2018, 1803.02133.

[4]  L. Ferretti,et al.  The third moments of the site frequency spectrum , 2017, bioRxiv.

[5]  F. Freund,et al.  Genealogical Properties of Subsamples in Highly Fecund Populations , 2017, bioRxiv.

[6]  Mogens Bladt,et al.  Matrix-Exponential Distributions in Applied Probability , 2017 .

[7]  R. Costa,et al.  Inference of Gene Flow in the Process of Speciation: An Efficient Maximum-Likelihood Method for the Isolation-with-Initial-Migration Model , 2017, Genetics.

[8]  Matthias Steinrücken,et al.  Computing the joint distribution of the total tree length across loci in populations with variable size. , 2016, Theoretical population biology.

[9]  M. Uyenoyama,et al.  Genealogical histories in structured populations. , 2015, Theoretical population biology.

[10]  Chunhua Ma,et al.  The Coalescent in Peripatric Metapopulations , 2015, J. Appl. Probab..

[11]  Martin Chmelik,et al.  Efficient Strategies for Calculating Blockwise Likelihoods Under the Coalescent , 2015, Genetics.

[12]  Asger Hobolth,et al.  Markovian approximation to the finite loci coalescent with recombination along multiple sequences. , 2014, Theoretical population biology.

[13]  N. Kurt,et al.  A new coalescent for seed-bank models , 2014, 1411.4747.

[14]  Jason Schweinsberg Rigorous results for a population model with selection II: genealogy of the population , 2014, 1507.00394.

[15]  M. Birkner,et al.  Statistical Properties of the Site-Frequency Spectrum Associated with Λ-Coalescents , 2013, Genetics.

[16]  Michael M. Desai,et al.  Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations , 2012, Genetics.

[17]  Oskar Hallatschek,et al.  Genealogies of rapidly adapting populations , 2012, Proceedings of the National Academy of Sciences.

[18]  G. Kersting The asymptotic distribution of the length of Beta-coalescent trees , 2011, 1107.2855.

[19]  A. Hobolth,et al.  Summary Statistics for Endpoint-Conditioned Continuous-Time Markov Chains , 2011, Journal of Applied Probability.

[20]  R. J. Harrison,et al.  A General Method for Calculating Likelihoods Under the Coalescent Process , 2011, Genetics.

[21]  J. Hey Isolation with migration models for more than two populations. , 2010, Molecular biology and evolution.

[22]  J. Wakeley,et al.  A coalescent process with simultaneous multiple mergers for approximating the gene genealogies of many marine organisms. , 2008, Theoretical population biology.

[23]  J. Wakeley Coalescent Theory: An Introduction , 2008 .

[24]  N. Berestycki,et al.  Small-time behavior of beta coalescents , 2006, math/0601032.

[25]  M. Drmota,et al.  Asymptotic results concerning the total branch length of the Bolthausen-Sznitman coalescent , 2007 .

[26]  C. Goldschmidt,et al.  Asymptotics of the allele frequency spectrum associated with the Bolthausen-Sznitman coalescent , 2007, 0706.2808.

[27]  Jean-François Delmas,et al.  Asymptotic results on the length of coalescent trees , 2007, 0706.0204.

[28]  John Wakeley,et al.  Coalescent Processes When the Distribution of Offspring Number Among Individuals Is Highly Skewed , 2006, Genetics.

[29]  Christina Goldschmidt,et al.  Random Recursive Trees and the Bolthausen-Sznitman Coalesent , 2005, math/0502263.

[30]  Z. Yang,et al.  Probability models for DNA sequence evolution , 2004, Heredity.

[31]  E. Árnason,et al.  Extent of mitochondrial DNA sequence variation in Atlantic cod from the Faroe Islands: a resolution of gene genealogy , 2003, Heredity.

[32]  Jason Schweinsberg Coalescent processes obtained from supercritical Galton-Watson processes , 2003 .

[33]  M. Kimmel,et al.  A note on distributions of times to coalescence, under time-dependent population size. , 2003, Theoretical population biology.

[34]  Martin Möhle,et al.  A Classification of Coalescent Processes for Haploid Exchangeable Population Models , 2001 .

[35]  S. Sagitov The general coalescent with asynchronous mergers of ancestral lines , 1999, Journal of Applied Probability.

[36]  J. Pitman Coalescents with multiple collisions , 1999 .

[37]  Churchill,et al.  A Markov Chain Model of Coalescence with Recombination , 1997, Theoretical population biology.

[38]  Y. Fu,et al.  Statistical properties of segregating sites. , 1995, Theoretical population biology.

[39]  C. J-F,et al.  THE COALESCENT , 1980 .

[40]  N. U. Prabhu,et al.  On the Ruin Problem of Collective Risk Theory , 1961 .