Competitive Closeness Testing

We test whether two sequences are generated by the same distribution or by two dierent ones. Unlike previous work, we make no assumptions on the distributions’ support size. Additionally, we compare our performance to that of the best possible test. We describe an eciently-computa ble algorithm based on pattern maximum likelihood that is near optimal whenever the best possible error probability is exp( 14n 2=3 ) using length-n sequences.

[1]  Alon Orlitsky,et al.  The maximum likelihood probability of skewed patterns , 2009, 2009 IEEE International Symposium on Information Theory.

[2]  N.P. Santhanam,et al.  New tricks for old dogs: Large alphabet probability estimation , 2007, 2007 IEEE Information Theory Workshop.

[3]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[4]  Alon Orlitsky,et al.  Limit results on pattern entropy , 2004, IEEE Transactions on Information Theory.

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  Jacob Ziv,et al.  On classification with empirically observed statistics and universal data compression , 1988, IEEE Trans. Inf. Theory.

[7]  Michael Gutman,et al.  Asymptotically optimal classification for multiple tests with empirically observed statistics , 1989, IEEE Trans. Inf. Theory.

[8]  Paul Valiant Testing symmetric properties of distributions , 2008, STOC '08.

[9]  G. Hardy,et al.  Asymptotic formulae in combinatory analysis , 1918 .

[10]  Sofya Raskhodnikova Property testing: theory and applications , 2003 .

[11]  G. Hardy,et al.  Asymptotic Formulaæ in Combinatory Analysis , 1918 .

[12]  Tugkan Batu Testing Properties of Distributions , 2001 .

[13]  HighWire Press Philosophical Transactions of the Royal Society of London , 1781, The London Medical Journal.

[14]  Alon Orlitsky,et al.  Universal compression of memoryless sources over unknown alphabets , 2004, IEEE Transactions on Information Theory.

[15]  Alon Orlitsky,et al.  Universal Compression of Markov and Related Sources Over Arbitrary Alphabets , 2006, IEEE Transactions on Information Theory.

[16]  Pramod Viswanath,et al.  Universal hypothesis testing in the learning-limited regime , 2010, 2010 IEEE International Symposium on Information Theory.

[17]  E. T. An Introduction to the Theory of Numbers , 1946, Nature.

[18]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[19]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[20]  H. Vincent Poor,et al.  An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[21]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[22]  Richard M. Wilson,et al.  A course in combinatorics , 1992 .

[23]  Alon Orlitsky,et al.  Classification using pattern probability estimators , 2010, 2010 IEEE International Symposium on Information Theory.