Testing from One Sample: Is the casino really using a riffle shuffle?

Classical distribution testing assumes access to i.i.d. samples from the distributions that are being tested. We initiate the study of Markov chain testing, assuming access to a single sample from the Markov Chains that are being tested. In particular, we get to observe a single trajectory X0, . . . , Xt, . . . of an unknown Markov Chain M, for which we do not even get to control the distribution of the starting state X0. Our goal is to test whether M is identical to a model Markov Chain M′. In the first part of the paper, we propose a measure of difference between two Markov chains, which captures the scaling behavior of the total variation distance between words sampled from the Markov chains as the length of these words grows. We provide efficient and sample nearoptimal testers for identity testing under our proposed measure of difference. In the second part of the paper, we study Markov chains whose state space is exponential in their description, providing testers for testing identity of card shuffles. We apply our results to testing the validity of the Gilbert, Shannon, and Reeds model for the riffle shuffle. Supported by a Microsoft Research Faculty Fellowship, and NSF Award CCF-1551875, CCF-1617730 and CCF1650733. Supported by NSF Award CCF-1551875, CCF-1617730 and CCF-1650733. Supported by NSF Award CCF-1551875, CCF-1617730 and CCF-1650733. ar X iv :1 70 4. 06 85 0v 1 [ cs .L G ] 2 2 A pr 2 01 7

[1]  Paul Rochet,et al.  Hypothesis testing for Markovian models with random time observations , 2015 .

[2]  A. Scott,et al.  The Analysis of Categorical Data from Complex Sample Surveys: Chi-Squared Tests for Goodness of Fit and Independence in Two-Way Tables , 1981 .

[3]  David S. Moore,et al.  The Effect of Dependence on Chi Squared Tests of Fit , 1982 .

[4]  David S. Moore,et al.  The Effect of Dependence on Chi-Squared and Empiric Distribution Tests of Fit , 1983 .

[5]  V. Climenhaga Markov chains and mixing times , 2013 .

[6]  Constantinos Daskalakis,et al.  Optimal Testing for Properties of Distributions , 2015, NIPS.

[7]  Demetrios Kazakos,et al.  The Bhattacharyya distance and detection between Markov chains , 1978, IEEE Trans. Inf. Theory.

[8]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[9]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[10]  M. Bartlett The frequency goodness of fit test for probability chains , 1951, Mathematical Proceedings of the Cambridge Philosophical Society.

[11]  P. Diaconis Group representations in probability and statistics , 1988 .

[12]  Ronitt Rubinfeld,et al.  Testing Shape Restrictions of Discrete Distributions , 2015, Theory of Computing Systems.

[13]  J. I The Design of Experiments , 1936, Nature.

[14]  Persi Diaconis,et al.  MATHEMATICAL DEVELOPMENTS FROM THE ANALYSIS OP RIFFLE SHUFFLING , 2003 .

[15]  Vincent Y. F. Tan,et al.  Error exponents for composite hypothesis testing of Markov forest distributions , 2010, 2010 IEEE International Symposium on Information Theory.

[16]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[17]  Rocco A. Servedio,et al.  Testing equivalence between distributions using conditional samples , 2014, SODA.

[18]  Ronitt Rubinfeld,et al.  Testing Closeness of Discrete Distributions , 2010, JACM.

[19]  Simon Tavaré,et al.  Serial dependence of observations leading to contingency tables, and corrections to chi-squared statistics , 1983 .

[20]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[21]  Gregory Valiant,et al.  An Automatic Inequality Prover and Instance Optimal Identity Testing , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[22]  Ilias Diakonikolas,et al.  Optimal Algorithms for Testing Closeness of Discrete Distributions , 2013, SODA.

[23]  Daniel M. Kane,et al.  A New Approach for Testing Properties of Discrete Distributions , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[24]  Rocco A. Servedio,et al.  Testing k-Modal Distributions: Optimal Algorithms via Reductions , 2011, SODA.

[25]  Leandro Pardo,et al.  ON SIZE INCREASE FOR GOODNESS OF FIT TESTS WHEN OBSERVATIONS ARE POSITIVELY DEPENDENT , 2002 .

[26]  Ronitt Rubinfeld,et al.  Testing Properties of Collections of Distributions , 2013, Theory Comput..

[27]  Csaba Szepesvári,et al.  Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path , 2015, NIPS.

[28]  P. Diaconis,et al.  Trailing the Dovetail Shuffle to its Lair , 1992 .

[29]  Elizabeth L. Wilmer,et al.  Markov Chains and Mixing Times , 2008 .