Estimation for General Birth-Death Processes

Birth-death processes (BDPs) are continuous-time Markov chains that track the number of “particles” in a system over time. While widely used in population biology, genetics, and ecology, statistical inference of the instantaneous particle birth and death rates remains largely limited to restrictive linear BDPs in which per-particle birth and death rates are constant. Researchers often observe the number of particles at discrete times, necessitating data augmentation procedures such as expectation-maximization (EM) to find maximum likelihood estimates (MLEs). For BDPs on finite state-spaces, there are powerful matrix methods for computing the conditional expectations needed for the E-step of the EM algorithm. For BDPs on infinite state-spaces, closed-form solutions for the E-step are available for some linear models, but most previous work has resorted to time-consuming simulation. Remarkably, we show that the E-step conditional expectations can be expressed as convolutions of computable transition probabilities for any general BDP with arbitrary rates. This important observation, along with a convenient continued fraction representation of the Laplace transforms of the transition probabilities, allows for novel and efficient computation of the conditional expectations for all BDPs, eliminating the need for truncation of the state-space or costly simulation. We use this insight to derive EM algorithms that yield maximum likelihood estimation for general BDPs characterized by various rate models, including generalized linear models (GLM). We show that our Laplace convolution technique outperforms competing methods when they are available and demonstrate a technique to accelerate EM algorithm convergence. We validate our approach using synthetic data and then apply our methods to cancer cell growth and estimation of mutation parameters in microsatellite evolution.

[1]  Sean Nee,et al.  Birth-Death Models in Macroevolution , 2006 .

[2]  S. Lahiri,et al.  Density estimation in high and ultra high dimensions, regularization, and the L1 asymptotics , 2012 .

[3]  G. Denardo,et al.  Antilymphoma effects of anti-HLA-DR and CD20 monoclonal antibodies (Lym-1 and Rituximab) on human lymphoma cells. , 2004, Cancer biotherapy & radiopharmaceuticals.

[4]  Marc A Suchard,et al.  Fitting Birth-Death Processes to Panel Data with Applications to Bacterial DNA Fingerprinting. , 2010, The annals of applied statistics.

[5]  I. Ibragimov,et al.  On Sequential Estimation , 1975 .

[6]  J. Felsenstein,et al.  An evolutionary model for maximum likelihood alignment of DNA sequences , 1991, Journal of Molecular Evolution.

[7]  Eric Renshaw,et al.  Stochastic Population Processes: Analysis, Approximations, Simulations , 2011 .

[8]  P. Green On Use of the EM Algorithm for Penalized Likelihood Estimation , 1990 .

[9]  Eric Renshaw,et al.  Stochastic Population Processes , 2011 .

[10]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[11]  Will Tribbey,et al.  Numerical Recipes: The Art of Scientific Computing (3rd Edition) is written by William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery, and published by Cambridge University Press, © 2007, hardback, ISBN 978-0-521-88068-8, 1235 pp. , 1987, SOEN.

[12]  R. Sibly,et al.  Likelihood-based estimation of microsatellite mutation rates. , 2003, Genetics.

[13]  W J Lentz,et al.  Generating bessel functions in mie scattering calculations using continued fractions. , 1976, Applied optics.

[14]  A. Hobolth,et al.  Statistical Applications in Genetics and Molecular Biology Statistical Inference in Evolutionary Models of DNA Sequences via the EM Algorithm , 2011 .

[15]  Norman T. J. Bailey The Elements of Stochastic Processes with Applications to the Natural Sciences , 1964 .

[16]  Annie A. M. Cuyt,et al.  Handbook of Continued Fractions for Special Functions , 2008 .

[17]  I. Meilijson A fast improvement to the EM algorithm on its own terms , 1989 .

[18]  H. Ellegren,et al.  Microsatellite evolution inferred from human– chimpanzee genomic sequence alignments , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Ward Whitt,et al.  Numerical Inversion of Laplace Transforms of Probability Distributions , 1995, INFORMS J. Comput..

[20]  Tushar M. Goradia,et al.  Multi-stage Markov analysis of progressive disease applied to melanoma , 1993 .

[21]  Xiao-Li Meng,et al.  Using EM to Obtain Asymptotic Variance-Covariance Matrices: The SEM Algorithm , 1991 .

[22]  James D. Murray Mathematical Biology: I. An Introduction , 2007 .

[23]  David M. Young,et al.  Conjugate Gradient Acceleration , 1981 .

[24]  B. Roehner,et al.  Application of Stieltjes theory for S-fractions to birth and death processes , 1983, Advances in Applied Probability.

[25]  Mark M. Tanaka,et al.  Estimating change rates of genetic markers using serial samples: applications to the transposon IS6110 in Mycobacterium tuberculosis. , 2003, Theoretical population biology.

[26]  J. Kalbfleisch,et al.  The Analysis of Panel Data under a Markov Assumption , 1985 .

[27]  J. Reynolds,et al.  ON ESTIMATING THE PARAMETERS OF A BIRTH-DEATH PROCESS , 1973 .

[28]  Ian Holmes,et al.  Evolutionary HMMs: a Bayesian approach to multiple alignment , 2001, Bioinform..

[29]  P. A. P. Moran,et al.  An introduction to probability theory , 1968 .

[30]  H. Wall,et al.  Analytic Theory of Continued Fractions , 2000 .

[31]  I Holmes,et al.  An expectation maximization algorithm for training hidden substitution models. , 2002, Journal of molecular biology.

[32]  Mukarram Ahmad,et al.  Continued fractions , 2019, Quadratic Number Theory.

[33]  Haakon Waadeland,et al.  Continued fractions with applications , 1994 .

[34]  Chuanhai Liu,et al.  Information matrix computation from conditional information via normal approximation , 1998 .

[35]  M. Bladt,et al.  Statistical inference for discretely observed Markov jump processes , 2005 .

[36]  Ward Whitt,et al.  Computing Laplace Transforms for Numerical Inversion Via Continued Fractions , 1999, INFORMS J. Comput..

[37]  R. Durrett,et al.  Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[38]  W. Amos Mutation Biases and Mutation Rate Variation Around Very Short Human Microsatellites Revealed by Human–Chimpanzee–Orangutan Genomic Sequence Alignments , 2010, Journal of Molecular Evolution.

[39]  William B. Jones,et al.  APPLICATION OF STIELTJES FRACTIONS TO BIRTH-DEATH PROCESSES , 1977 .

[40]  Chuanhai Liu,et al.  The dynamic ‘expectation–conditional maximization either’ algorithm , 2012 .

[41]  Eugene V. Koonin,et al.  Biological applications of the theory of birth-and-death processes , 2005, Briefings Bioinform..

[42]  F. J. Anscombe,et al.  Topics in the Investigation of Linear Relations Fitted by the Method of Least Squares , 1967 .

[43]  D. Oakes Direct calculation of the information matrix via the EM , 1999 .

[44]  P. A. P. Moran,et al.  Estimation Methods for Evolutive Processes , 1951 .

[45]  D N Stivers,et al.  Relative mutation rates at di-, tri-, and tetranucleotide microsatellite loci. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Hua Zhou,et al.  Graphics Processing Units and High-Dimensional Optimization. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[47]  S. Tyekucheva,et al.  The genome-wide determinants of human and chimpanzee microsatellite evolution. , 2007, Genome research.

[48]  P. R. Parthasarathy,et al.  Exact transient solution of a state-dependent birth-death process , 2006 .

[49]  C. Crainiceanu,et al.  Fast Adaptive Penalized Splines , 2008 .

[50]  Kenneth Lange,et al.  Numerical analysis for statisticians , 1999 .

[51]  G. Denardo Concepts in radioimmunotherapy and immunotherapy: Radioimmunotherapy from a Lym-1 perspective. , 2005, Seminars in oncology.

[52]  Ernst Joachim Weniger,et al.  Nonlinear sequence transformations for the acceleration of convergence and the summation of divergent series , 1989 .

[53]  P. Moran,et al.  THE ESTIMATION OF THE PARAMETERS OF A BIRTH AND DEATH PROCESS. , 1953 .

[54]  H. Andersson,et al.  Stochastic Epidemic Models and Their Statistical Analysis , 2000 .

[55]  Samuel Karlin,et al.  The classification of birth and death processes , 1957 .

[56]  H. Ellegren Microsatellites: simple sequences with complex evolution , 2004, Nature Reviews Genetics.

[57]  R. Durrett,et al.  Dinucleotide repeats in the Drosophila and human genomes have complex, length-dependent mutation processes. , 2003, Molecular biology and evolution.

[58]  Stephen M. Krone,et al.  Ancestral Processes with Selection , 1997, Theoretical population biology.

[59]  Niels Keiding,et al.  Estimation in the birth process , 1974 .

[60]  A. Bhargava,et al.  Mutational Dynamics of Microsatellites , 2010, Molecular biotechnology.

[61]  A. Jensen,et al.  Markoff chains as an aid in the study of Markoff processes , 1953 .

[62]  Asger Hobolth,et al.  SIMULATION FROM ENDPOINT-CONDITIONED, CONTINUOUS-TIME MARKOV CHAINS ON A FINITE STATE SPACE, WITH APPLICATIONS TO MOLECULAR EVOLUTION. , 2009, The annals of applied statistics.

[63]  C. Schlötterer Evolutionary dynamics of microsatellite DNA , 2000, Chromosoma.

[64]  B. Dujon,et al.  Comparative Genomics and Molecular Dynamics of DNA Repeats in Eukaryotes , 2008, Microbiology and Molecular Biology Reviews.

[65]  Niels Keiding Maximum Likelihood Estimation in the Birth-and-Death Process , 1975 .

[66]  R. Jennrich,et al.  Conjugate Gradient Acceleration of the EM Algorithm , 1993 .

[67]  Fabrice Guillemin,et al.  Excursions of birth and death processes, orthogonal polynomials, and continued fractions , 1999, Journal of Applied Probability.

[68]  R M May,et al.  The reconstructed evolutionary process. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[69]  D. Rubin,et al.  The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence , 1994 .

[70]  Charles Elkan,et al.  Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[71]  James C. Frauenthal,et al.  Stochastic Epidemic Models , 1980 .

[72]  R. Page,et al.  Rates and patterns of gene duplication and loss in the human genome , 2005, Proceedings of the Royal Society B: Biological Sciences.

[73]  David Levin,et al.  Development of non-linear transformations for improving convergence of sequences , 1972 .

[74]  Jeffery P. Demuth,et al.  The Evolution of Mammalian Gene Families , 2006, PloS one.

[75]  P. A. P. Moran,et al.  Random processes in genetics , 1958, Mathematical Proceedings of the Cambridge Philosophical Society.

[76]  D. Falush,et al.  A threshold size for microsatellite expansion. , 1998, Molecular biology and evolution.

[77]  Fabrice Guillemin,et al.  Continued Fraction Analysis of the Duration of an Excursion in an M/M/∞ System , 1998, Journal of Applied Probability.

[78]  Ward Whitt,et al.  The Fourier-series method for inverting transforms of probability distributions , 1992, Queueing Syst. Theory Appl..

[79]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[80]  K. Lange A gradient algorithm locally equivalent to the EM algorithm , 1995 .

[81]  Charles R. Doss,et al.  Great Expectations: EM Algorithms for Discretely Observed Linear Birth-Death-Immigration Processes , 2010 .

[82]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[83]  J. H. Darwin THE BEHAVIOUR OF AN ESTIMATOR FOR A SIMPLE BIRTH AND DEATH PROCESS , 1956 .

[84]  W. Amos,et al.  Quantifying ascertainment bias and species-specific length differences in human and chimpanzee microsatellites using genome sequences. , 2006, Molecular biology and evolution.

[85]  A. Hobolth,et al.  Summary Statistics for Endpoint-Conditioned Continuous-Time Markov Chains , 2011, Journal of Applied Probability.

[86]  G. Blanch,et al.  Numerical Evaluation of Continued Fractions , 1964 .

[87]  S. Karlin,et al.  The differential equations of birth-and-death processes, and the Stieltjes moment problem , 1957 .

[88]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[89]  Jean-Yves Dauxois,et al.  Bayesian inference for linear growth birth and death processes , 2004 .

[90]  W. Leighton,et al.  Numerical Continued Fractions , 1942 .

[91]  Ronald W. Wolff,et al.  Problems of Statistical Inference for Birth and Death Queuing Models , 1965 .

[92]  Ward Whitt,et al.  Numerical inversion of probability generating functions , 1992, Oper. Res. Lett..

[93]  Asger Hobolth,et al.  A Markov chain Monte Carlo Expectation Maximization Algorithm for Statistical Analysis of DNA Sequence Evolution with Neighbor-Dependent Substitution Rates , 2008 .

[94]  Raazesh Sainudiin,et al.  Microsatellite Mutation Models , 2004, Genetics.

[95]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[96]  P. Flajolet,et al.  The formal theory of birth-and-death processes, lattice path combinatorics and continued fractions , 2000, Advances in Applied Probability.

[97]  Forrest W. Crawford,et al.  Transition probabilities for general birth–death processes with applications in ecology, genetics, and evolution , 2011, Journal of Mathematical Biology.

[98]  Christof Schütte,et al.  Generator estimation of Markov jump processes , 2007, J. Comput. Phys..

[99]  A. R. Barnett,et al.  Coulomb and Bessel functions of complex arguments and order , 1986 .

[100]  L. Beckett,et al.  On the analysis of count data of birth‐and‐death process type: with application to molecularly targeted cancer therapy , 2007, Statistics in medicine.

[101]  William B. Jones,et al.  A survey of truncation error analysis for Padé and continued fraction approximants , 1993 .

[102]  J. Staněk,et al.  Stochastic Epidemic Models , 2006 .

[103]  W. Press,et al.  Numerical Recipes: The Art of Scientific Computing , 1987 .

[104]  J. A. Murhy,et al.  Some Properties of Continued Fractions with Applications in Markov Processes , 1975 .

[105]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[106]  K. Eckert,et al.  Every microsatellite is different: Intrinsic DNA features dictate mutagenesis of common microsatellites present in the human genome , 2009, Molecular carcinogenesis.

[107]  Kurt Hornik,et al.  The Comprehensive R Archive Network , 2012 .

[108]  Marc A Suchard,et al.  Counting labeled transitions in continuous-time Markov models of evolution , 2007, Journal of mathematical biology.