PhaseTypeR: phase-type distributions in R with reward transformations and a view towards population genetics

Phase-type distributions are a general class of models that are traditionally used in actuarial sciences and queuing theory, and more recently in population genetics. A phase-type distributed random variable is the time to absorption in a discrete or continuous time Markov chain on a finite state space with an absorbing state. The R package PhaseTypeR contains all the key functions—mean, (co)variance, probability density function, cumulative distribution function, quantile function, random sampling and reward transformations—for both continuous (PH) and discrete (DPH) phase-type distributions. Additionally, we have also implemented the multivariate continuous case (MPH) and the multivariate discrete case (MDPH). We illustrate the usage of PhaseTypeR in simple examples from population genetics (e.g. the time until the most recent common ancestor or the total number of mutations in an alignment of homologous DNA sequences), and we demonstrate the power of PhaseTypeR in more involved applications from population genetics, such as the coalescent with recombination and the structured coalescent. The multivariate distributions and ability to reward-transform are particularly important in population genetics, and a unique feature of PhaseTypeR.

[1]  O. Mazet,et al.  Inferring number of populations and changes in connectivity under the n-island model , 2020, Heredity.

[2]  Ellen Baake,et al.  Probabilistic Structures in Evolution , 2021 .

[3]  Martin Bladt,et al.  matrixdist: An R Package for Inhomogeneous Phase-Type Distributions , 2021, 2101.07987.

[4]  Mogens Bladt,et al.  Multivariate phase-type theory for the site frequency spectrum , 2021, Journal of Mathematical Biology.

[5]  Chris Sherlock,et al.  Direct statistical inference for finite Markov jump processes via the matrix exponential , 2018, Computational Statistics.

[6]  F. Freund Multiple-merger genealogies: Models, consequences, inference , 2020, 2010.12271.

[7]  A. Hobolth,et al.  Studying models of balancing selection using phase-type theory , 2020, bioRxiv.

[8]  M. Bladt,et al.  Fitting inhomogeneous phase‐type distributions to data: the univariate and the multivariate case , 2020, Scandinavian Journal of Statistics.

[9]  M. Birkner,et al.  Genealogies and inference for populations with highly skewed offspring distributions , 2019, 1912.07977.

[10]  Mogens Bladt,et al.  Inhomogeneous phase-type distributions and heavy tails , 2018, Journal of Applied Probability.

[11]  M. Bladt,et al.  Phase-type distributions in population genetics. , 2018, Theoretical population biology.

[12]  Mogens Bladt,et al.  Matrix-Exponential Distributions in Applied Probability , 2017 .

[13]  Tadashi Dohi,et al.  PH FITTING ALGORITHM AND ITS APPLICATION TO RELIABILITY ENGINEERING , 2016 .

[14]  Tadashi Dohi,et al.  mapfit: An R-Based Tool for PH/MAP Parameter Estimation , 2015, QEST.

[15]  Asger Hobolth,et al.  The SMC′ Is a Highly Accurate Approximation to the Ancestral Recombination Graph , 2015, Genetics.

[16]  A. Tellier,et al.  Coalescence 2.0: a multiple branching of recent theoretical developments and their applications , 2014, bioRxiv.

[17]  Louis J. M. Aslett,et al.  MCMC for inference on phase-type and masked system lifetime models , 2012 .

[18]  Simon P. Wilson,et al.  Markov chain Monte Carlo for Inference on Phase-type Models , 2011 .

[19]  J. Wakeley Coalescent Theory: An Introduction , 2008 .

[20]  Christophe Dutang,et al.  actuar: An R Package for Actuarial Science , 2008 .

[21]  N. U. Prabhu,et al.  Stochastic Processes and Their Applications , 1999 .

[22]  Steffen L. Lauritzen,et al.  The estimation of phase-type related functionals using Markov chain Monte Carlo methods , 2003 .

[23]  Churchill,et al.  A Markov Chain Model of Coalescence with Recombination , 1997, Theoretical population biology.

[24]  C. J-F,et al.  THE COALESCENT , 1980 .