Testing Symmetric Markov Chains From a Single Trajectory

Classical distribution testing assumes access to i.i.d. samples from the distribution that is being tested. We initiate the study of Markov chain testing, assuming access to a single trajectory of a Markov Chain. In particular, we observe a single trajectory X0,...,Xt,... of an unknown, symmetric, and finite state Markov Chain M. We do not control the starting state X0, and we cannot restart the chain. Given our single trajectory, the goal is to test whether M is identical to a model Markov Chain M0 , or far from it under an appropriate notion of difference. We propose a measure of difference between two Markov chains, motivated by the early work of Kazakos [Kaz78], which captures the scaling behavior of the total variation distance between trajectories sampled from the Markov chains as the length of these trajectories grows. We provide efficient testers and information-theoretic lower bounds for testing identity of symmetric Markov chains under our proposed measure of difference, which are tight up to logarithmic factors if the hitting times of the model chain M0 is O(n) in the size of the state space n.

[1]  Daniel M. Kane,et al.  Robust Learning of Fixed-Structure Bayesian Networks , 2016, NeurIPS.

[2]  Simon Tavaré,et al.  Serial dependence of observations leading to contingency tables, and corrections to chi-squared statistics , 1983 .

[3]  Ronitt Rubinfeld,et al.  Testing Shape Restrictions of Discrete Distributions , 2015, Theory of Computing Systems.

[4]  Ronitt Rubinfeld,et al.  Testing Closeness of Discrete Distributions , 2010, JACM.

[5]  Gregory Valiant,et al.  An Automatic Inequality Prover and Instance Optimal Identity Testing , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[6]  Constantinos Daskalakis,et al.  Which Distribution Distances are Sublinearly Testable? , 2017, Electron. Colloquium Comput. Complex..

[7]  Rocco A. Servedio,et al.  Testing equivalence between distributions using conditional samples , 2014, SODA.

[8]  Ronitt Rubinfeld,et al.  Testing Properties of Collections of Distributions , 2013, Theory Comput..

[9]  Csaba Szepesvári,et al.  Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path , 2015, NIPS.

[10]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[11]  Paul Rochet,et al.  Hypothesis testing for Markovian models with random time observations , 2015 .

[12]  David S. Moore,et al.  The Effect of Dependence on Chi Squared Tests of Fit , 1982 .

[13]  R. A. Fisher,et al.  Design of Experiments , 1936 .

[14]  Leandro Pardo,et al.  ON SIZE INCREASE FOR GOODNESS OF FIT TESTS WHEN OBSERVATIONS ARE POSITIVELY DEPENDENT , 2002 .

[15]  M. Bartlett The frequency goodness of fit test for probability chains , 1951, Mathematical Proceedings of the Cambridge Philosophical Society.

[16]  Oded Goldreich A Brief Introduction to Property Testing , 2010, Property Testing.

[17]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[18]  Ilias Diakonikolas,et al.  Optimal Algorithms for Testing Closeness of Discrete Distributions , 2013, SODA.

[19]  A. Scott,et al.  The Analysis of Categorical Data from Complex Sample Surveys: Chi-Squared Tests for Goodness of Fit and Independence in Two-Way Tables , 1981 .

[20]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[21]  Demetrios Kazakos,et al.  The Bhattacharyya distance and detection between Markov chains , 1978, IEEE Trans. Inf. Theory.

[22]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[23]  Daniel M. Kane,et al.  A New Approach for Testing Properties of Discrete Distributions , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[24]  Rocco A. Servedio,et al.  Testing k-Modal Distributions: Optimal Algorithms via Reductions , 2011, SODA.

[25]  Constantinos Daskalakis,et al.  Optimal Testing for Properties of Distributions , 2015, NIPS.

[26]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[27]  Ronitt Rubinfeld Taming big probability distributions , 2012, XRDS.

[28]  David S. Moore,et al.  The Effect of Dependence on Chi-Squared and Empiric Distribution Tests of Fit , 1983 .

[29]  Constantinos Daskalakis,et al.  Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing , 2016, COLT.

[30]  Clément L. Canonne,et al.  A Survey on Distribution Testing: Your Data is Big. But is it Blue? , 2020, Electron. Colloquium Comput. Complex..

[31]  Vincent Y. F. Tan,et al.  Error exponents for composite hypothesis testing of Markov forest distributions , 2010, 2010 IEEE International Symposium on Information Theory.