Applications of Universal Source Coding to Statistical Analysis of Time Series

We show how universal codes can be used for solving some of the most important statistical problems for time series. By definition, a universal code (or a universal lossless data compressor) can compress any sequence generated by a stationary and ergodic source asymptotically to the Shannon entropy, which, in turn, is the best achievable ratio for lossless data compressors. We consider finite-alphabet and real-valued time series and the following problems: estimation of the limiting probabilities for finite-alphabet time series and estimation of the density for real-valued time series, the on-line prediction, regression, classification (or problems with side information) for both types of the time series and the following problems of hypothesis testing: goodness-of-fit testing, or identity testing, and testing of serial independence. It is important to note that all problems are considered in the framework of classical mathematical statistics and, on the other hand, everyday methods of data compression (or archivers) can be used as a tool for the estimation and testing. It turns out, that quite often the suggested methods and tests are more powerful than known ones when they are applied in practice.

[1]  Claude E. Shannon,et al.  Communication theory of secrecy systems , 1949, Bell Syst. Tech. J..

[2]  Boris Ryabko,et al.  Using Shannon entropy and Kolmogorov complexity to study the communicative system and cognitive capacities in ants , 1996 .

[3]  Dharmendra S. Modha,et al.  Memory-Universal Prediction of Stationary Random Processes , 1998, IEEE Trans. Inf. Theory.

[4]  R. Gallager Information Theory and Reliable Communication , 1968 .

[5]  John C. Kieffer,et al.  A unified approach to weak universal source coding , 1978, IEEE Trans. Inf. Theory.

[6]  Jorma Rissanen,et al.  Generalized Kraft Inequality and Arithmetic Coding , 1976, IBM J. Res. Dev..

[7]  W. Feller,et al.  An Introduction to Probability Theory and Its Applications; Vol. 1 , 1969 .

[8]  Boris Ryabko,et al.  Using Shannon entropy and Kolmogorov complexity to study the communicative system and cognitive capacities in ants , 1996, Complex..

[9]  I. Csiszár,et al.  The consistency of the BIC Markov order estimator , 2000 .

[10]  Chuang-Chun Liu,et al.  The optimal error exponent for Markov order estimation , 1996, IEEE Trans. Inf. Theory.

[11]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[12]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[13]  R. E. Krichevskii Universal Compression and Retrieval , 1994 .

[14]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[15]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[16]  A. Orlitsky,et al.  Always Good Turing: Asymptotically Optimal Probability Estimation , 2003, Science.

[17]  V. A. Monarev,et al.  Using Information Theory Approach to Randomness Testing , 2003, IACR Cryptol. ePrint Arch..

[18]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[19]  Boris Ryabko,et al.  On Asymptotically Optimal Methods of Prediction and Adaptive Coding for Markov Sources , 2002, J. Complex..

[20]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[21]  Jaakko Astola,et al.  Application of Kolmogorov complexity and universal codes to identity testing and nonparametric testing of serial independence for time series , 2005, Theor. Comput. Sci..

[22]  Alon Orlitsky,et al.  A lower bound on compression of unknown alphabets , 2005, Theor. Comput. Sci..

[23]  En-Hui Yang,et al.  Grammar-based codes: A new class of universal lossless source codes , 2000, IEEE Trans. Inf. Theory.

[24]  Jaakko Astola,et al.  Fast Codes for Large Alphabets , 2003, Commun. Inf. Syst..

[25]  Zhanna Reznikova,et al.  Animal Intelligence: From Individual to Social Cognition , 2007 .

[26]  Ueli Maurer,et al.  Information-Theoretic Cryptography , 1999, CRYPTO.

[27]  Sanjeev R. Kulkarni,et al.  Universal lossless source coding with the Burrows Wheeler Transform , 2002, IEEE Trans. Inf. Theory.

[28]  Serap A. Savari A probabilistic approach to some asymptotics in noiseless communication , 2000, IEEE Trans. Inf. Theory.

[29]  Andrew B. Nobel,et al.  On optimal sequential prediction for general processes , 2003, IEEE Trans. Inf. Theory.

[30]  John L. Kelly,et al.  A new interpretation of information rate , 1956, IRE Trans. Inf. Theory.

[31]  A. Barron THE STRONG ERGODIC THEOREM FOR DENSITIES: GENERALIZED SHANNON-MCMILLAN-BREIMAN THEOREM' , 1985 .

[33]  Ronald de Wolf,et al.  Algorithmic clustering of music , 2003, Proceedings of the Fourth International Conference onWeb Delivering of Music, 2004. EDELMUSIC 2004..

[34]  I. Good,et al.  Ergodic theory and information , 1966 .

[35]  Elaine B. Barker,et al.  A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications , 2000 .

[36]  Marcus Hutter,et al.  Sequence prediction for non-stationary processes , 2006, Combinatorial and Algorithmic Foundations of Pattern and Association Discovery.

[37]  V. A. Monarev,et al.  Experimental investigation of forecasting methods based on data compression algorithms , 2005, Probl. Inf. Transm..

[38]  Pawel Góra,et al.  A New Statistical Method for Filtering and Entropy Estimation of a Chaotic Map from Noisy Data , 2004, Int. J. Bifurc. Chaos.

[39]  Edgar N. Gilbert,et al.  Codes based on inaccurate source probabilities , 1971, IEEE Trans. Inf. Theory.

[40]  Jaakko Astola,et al.  Adaptive Coding and Prediction of Sources With Large and Infinite Alphabets , 2004, IEEE Transactions on Information Theory.

[41]  Boris Ryabko,et al.  The Complexity and Effectiveness of Prediction Algorithms , 1994, J. Complex..

[42]  László Györfi,et al.  There is no universal source code for an infinite source alphabet , 1994, IEEE Trans. Inf. Theory.

[43]  Aarnout Brombacher,et al.  Probability... , 2009, Qual. Reliab. Eng. Int..

[44]  Paul H. Algoet,et al.  Universal Schemes for Learning the Best Nonlinear Predictor Given the Infinite Past and Side Information , 1999, IEEE Trans. Inf. Theory.

[45]  Ming Li,et al.  Clustering by compression , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..