Generator estimation of Markov jump processes

Estimating the generator of a continuous-time Markov jump process based on incomplete data is a problem which arises in various applications ranging from machine learning to molecular dynamics. Several methods have been devised for this purpose: a quadratic programming approach (cf. [D.T. Crommelin, E. Vanden-Eijnden, Fitting timeseries by continuous-time Markov chains: a quadratic programming approach, J. Comp. Phys. 217 (2006) 782-805]), a resolvent method (cf. [T. Muller, Modellierung von Proteinevolution, PhD thesis, Heidelberg, 2001]), and various implementations of an expectation-maximization algorithm ([S. Asmussen, O. Nerman, M. Olsson, Fitting phase-type distributions via the EM algorithm, Scand. J. Stat. 23 (1996) 419-441; I. Holmes, G.M. Rubin, An expectation maximization algorithm for training hidden substitution models, J. Mol. Biol. 317 (2002) 753-764; U. Nodelman, C.R. Shelton, D. Koller, Expectation maximization and complex duration distributions for continuous time Bayesian networks, in: Proceedings of the twenty-first conference on uncertainty in AI (UAI), 2005, pp. 421-430; M. Bladt, M. Sorensen, Statistical inference for discretely observed Markov jump processes, J.R. Statist. Soc. B 67 (2005) 395-410]). Some of these methods, however, seem to be known only in a particular research community, and have later been reinvented in a different context. The purpose of this paper is to compile a catalogue of existing approaches, to compare the strengths and weaknesses, and to test their performance in a series of numerical examples. These examples include carefully chosen model problems and an application to a time series from molecular dynamics.

[1]  W. E,et al.  Towards a Theory of Transition Paths , 2006 .

[2]  Johannes Schmidt-Ehrenberg,et al.  Metastable Conformations via successive Perron-Cluster Cluster Analysis of dihedrals , 2002 .

[3]  Eric Vanden-Eijnden,et al.  Fitting timeseries by continuous-time Markov chains: A quadratic programming approach , 2006, J. Comput. Phys..

[4]  P. Deuflhard,et al.  Identification of almost invariant aggregates in reversible nearly uncoupled Markov chains , 2000 .

[5]  B. Pendleton,et al.  Hybrid Monte Carlo simulations theory and initial comparison with molecular dynamics , 1993 .

[6]  Lars Arvestad,et al.  Estimation of Reversible Substitution Matrices from Multiple Pairs of Sequences , 1997, Journal of Molecular Evolution.

[7]  A. Hobolth,et al.  Statistical Applications in Genetics and Molecular Biology Statistical Inference in Evolutionary Models of DNA Sequences via the EM Algorithm , 2011 .

[8]  R. Spang,et al.  Estimating amino acid substitution models: a comparison of Dayhoff's estimator, the resolvent approach and a maximum likelihood method. , 2002, Molecular biology and evolution.

[9]  J. Rosenthal,et al.  Finding Generators for Markov Chains via Empirical Transition Matrices, with Applications to Credit Ratings , 2001 .

[10]  Eric Vanden-Eijnden,et al.  Transition Path Theory for Markov Jump Processes , 2009, Multiscale Model. Simul..

[11]  M. Bladt,et al.  Statistical inference for discretely observed Markov jump processes , 2005 .

[12]  D. van der Spoel,et al.  GROMACS: A message-passing parallel molecular dynamics implementation , 1995 .

[13]  Daphne Koller,et al.  Expectation Maximization and Complex Duration Distributions for Continuous Time Bayesian Networks , 2005, UAI.

[14]  Philipp Metzner,et al.  Illustration of transition path theory on a collection of simple examples. , 2006, The Journal of chemical physics.

[15]  Marcel F. Neuts,et al.  Algorithmic probability - a collection of problems , 1995, Stochastic modeling series.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  Berk Hess,et al.  GROMACS 3.0: a package for molecular simulation and trajectory analysis , 2001 .

[18]  Ren Asmussen,et al.  Fitting Phase-type Distributions via the EM Algorithm , 1996 .

[19]  I Holmes,et al.  An expectation maximization algorithm for training hidden substitution models. , 2002, Journal of molecular biology.