Exact Goodness‐of‐Fit Tests for Markov Chains

Goodness‐of‐fit tests are useful in assessing whether a statistical model is consistent with available data. However, the usual χ2 asymptotics often fail, either because of the paucity of the data or because a nonstandard test statistic is of interest. In this article, we describe exact goodness‐of‐fit tests for first‐ and higher order Markov chains, with particular attention given to time‐reversible ones. The tests are obtained by conditioning on the sufficient statistics for the transition probabilities and are implemented by simple Monte Carlo sampling or by Markov chain Monte Carlo. They apply both to single and to multiple sequences and allow a free choice of test statistic. Three examples are given. The first concerns multiple sequences of dry and wet January days for the years 1948–1983 at Snoqualmie Falls, Washington State, and suggests that standard analysis may be misleading. The second one is for a four‐state DNA sequence and lends support to the original conclusion that a second‐order Markov chain provides an adequate fit to the data. The last one is six‐state atomistic data arising in molecular conformational dynamics simulation of solvated alanine dipeptide and points to strong evidence against a first‐order reversible Markov chain at 6 picosecond time steps.

[1]  W. T. Tutte,et al.  On Unicursal Paths in a Network of Degree 4 , 1941 .

[2]  M. Bartlett The frequency goodness of fit test for probability chains , 1951, Mathematical Proceedings of the Cambridge Philosophical Society.

[3]  de Ng Dick Bruijn,et al.  Circuits and Trees in Oriented Linear Graphs , 1951 .

[4]  Paul G. Hoel,et al.  A TEST FOR MARKOFF CHAINS , 1954 .

[5]  P. Whittle,et al.  Some Distribution and Moment Formulae for the Markov Chain , 1955 .

[6]  I. J. Good,et al.  Exact Markov Probabilities from Oriented Linear Graphs , 1957 .

[7]  T. W. Anderson,et al.  Statistical Inference about Markov Chains , 1957 .

[8]  M. Dwass Modified Randomization Tests for Nonparametric Hypotheses , 1957 .

[9]  P. Billingsley,et al.  Statistical Methods in Markov Chains , 1961 .

[10]  W. C. Krumbein,et al.  Markov chains and embedded Markov chains in geology , 1969 .

[11]  J. Klotz Statistical Inference in Bernoulli Trials with Dependence , 1973 .

[12]  Chris Chatfield,et al.  Statistical Inference Regarding Markov Chain Models , 1973 .

[13]  Donald B. Johnson,et al.  Finding All the Elementary Circuits of a Directed Graph , 1975, SIAM J. Comput..

[14]  Anton Kotzig,et al.  Transformations of Euler Tours , 1980 .

[15]  Anthony Unwin,et al.  Reversibility and Stochastic Networks , 1980 .

[16]  Michael Batty,et al.  Symmetry and reversibility in social exchange , 1981 .

[17]  Peter McCullagh,et al.  Some applications of quasisymmetry , 1982 .

[18]  P. De Jong,et al.  First-order Markov chains with a zero diagonal transition matrix , 1984 .

[19]  P. Hall,et al.  The Effect of Simulation Order on Level Accuracy and Power of Monte Carlo Tests , 1989 .

[20]  J. Besag,et al.  Generalized Monte Carlo significance tests , 1989 .

[21]  David R. Richman,et al.  A method for determining the reversibility of a Markov sequence , 1990 .

[22]  David Aldous,et al.  The Random Walk Construction of Uniform Spanning Trees and Uniform Labelled Trees , 1990, SIAM J. Discret. Math..

[23]  W. Sharp,et al.  Reversible Markov grain sequences in granite , 1991 .

[24]  J. Besag,et al.  Sequential Monte Carlo p-values , 1991 .

[25]  A. Gottschau Exchangeability in multivariate Markov chain models. , 1992, Biometrics.

[26]  P. Guttorp Stochastic modeling of scientific data , 1995 .

[27]  A. Solow,et al.  An exact McNemar test for paired binary Markov chains. , 1996, Biometrics.

[28]  Peter Winkler,et al.  Shuffling Biological Sequences , 1996, Discret. Appl. Math..

[29]  Fernando A. Quintana,et al.  Assessing the Order of Dependence for Partially Exchangeable Binary Data , 1998 .

[30]  Daniel A. Henderson,et al.  Fitting Markov chain models to discrete state series such as DNA sequences , 1999 .

[31]  W. McCausland Time Reversibility of Stationary Regular Finite State Markov Chains , 2004 .

[32]  Peter Winkler,et al.  Counting Eulerian Circuits is #P-Complete , 2005, ALENEX/ANALCO.

[33]  Silke W. W. Rolles,et al.  Bayesian analysis for reversible Markov chains , 2006, math/0605582.

[34]  John D. Chodera,et al.  Long-Time Protein Folding Dynamics from Short-Time Molecular Dynamics Simulations , 2006, Multiscale Model. Simul..

[35]  K. Dill,et al.  Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. , 2007, The Journal of chemical physics.

[36]  John D Chodera,et al.  Bayesian comparison of Markov models of molecular dynamics with detailed balance constraint. , 2009, The Journal of chemical physics.

[37]  Sergio Bacallado,et al.  Bayesian analysis of variable-order, reversible Markov chains , 2011, 1105.2640.

[38]  Frank Noé,et al.  Markov models of molecular kinetics: generation and validation. , 2011, The Journal of chemical physics.

[39]  Davide Di Cecco Conditional exact tests for Markovianity and reversibility in multiple categorical sequences , 2012 .

[40]  P. Daas DISCUSSION PAPER , 2014 .