Using data compressors to construct order tests for homogeneity and component independence

Abstract Nonparametric order tests for homogeneity and component independence are proposed, which are based on data compressors. For homogeneity testing the idea is to compress the word obtained by ordering the combined samples and writing the number of the sample in the place of each element. H0 should be rejected if the string is compressed to a certain degree and accepted otherwise. We show that such a test obtained from an ideal data compressor is valid against all alternatives. Component independence is reduced to homogeneity testing.

[1]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[2]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Texts in Computer Science.

[3]  Jaakko Astola,et al.  Universal Codes as a Basis for Time Series Testing , 2006, ArXiv.

[4]  Ming Li,et al.  Clustering by compression , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..

[5]  Ronald de Wolf,et al.  Algorithmic Clustering of Music Based on String Compression , 2004, Computer Music Journal.

[6]  Ronald de Wolf,et al.  Algorithmic clustering of music , 2003, Proceedings of the Fourth International Conference onWeb Delivering of Music, 2004. EDELMUSIC 2004..

[7]  V. A. Monarev,et al.  Using information theory approach to randomness testing , 2005 .

[8]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[9]  Jaakko Astola,et al.  Application of Kolmogorov complexity and universal codes to identity testing and nonparametric testing of serial independence for time series , 2005, Theor. Comput. Sci..

[10]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.