A new complexity measure for time series analysis and classification

Complexity measures are used in a number of applications including extraction of information from data such as ecological time series, detection of non-random structure in biomedical signals, testing of random number generators, language recognition and authorship attribution etc. Different complexity measures proposed in the literature like Shannon entropy, Relative entropy, Lempel-Ziv, Kolmogrov and Algorithmic complexity are mostly ineffective in analyzing short sequences that are further corrupted with noise. To address this problem, we propose a new complexity measure ETC and define it as the “Effort To Compress” the input sequence by a lossless compression algorithm. Here, we employ the lossless compression algorithm known as Non-Sequential Recursive Pair Substitution (NSRPS) and define ETC as the number of iterations needed for NSRPS to transform the input sequence to a constant sequence. We demonstrate the utility of ETC in two applications. ETC is shown to have better correlation with Lyapunov exponent than Shannon entropy even with relatively short and noisy time series. The measure also has a greater rate of success in automatic identification and classification of short noisy sequences, compared to entropy and a popular measure based on Lempel-Ziv compression (implemented by Gzip).

[1]  Khalid Sayood,et al.  Introduction to Data Compression , 1996 .

[2]  Werner Ebeling,et al.  PARTITION-BASED ENTROPIES OF DETERMINISTIC AND STOCHASTIC MAPS , 2001 .

[3]  S M Pincus,et al.  Approximate entropy as a measure of system complexity. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Horst Malchow,et al.  Experimental demonstration of chaos in a microbial food web , 2005, Nature.

[5]  Kenshi Sakai,et al.  Detecting chaos in a citrus orchard: Reconstruction of nonlinear dynamics from very short ecological time series , 2008 .

[6]  Gregory. J. Chaitin,et al.  Algorithmic information theory , 1987, Cambridge tracts in theoretical computer science.

[7]  H. Kantz,et al.  Nonlinear time series analysis , 1997 .

[8]  P. Sah,et al.  Stabilizing biological populations and metapopulations through Adaptive Limiter Control. , 2012, Journal of theoretical biology.

[9]  Emanuele Caglioti,et al.  Non-sequential recursive pair substitution: some rigorous results , 2006, cond-mat/0607749.

[10]  Vittorio Loreto,et al.  Language trees and zipping. , 2002, Physical review letters.

[11]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[12]  Ranjan Bose,et al.  A novel compression and encryption scheme using variable model arithmetic coding and coupled chaotic system , 2006, IEEE Transactions on Circuits and Systems I: Regular Papers.

[13]  M. Rosenstein,et al.  A practical method for calculating largest Lyapunov exponents from small data sets , 1993 .

[14]  Ebeling,et al.  Entropies of biosequences: The role of repeats. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[15]  Fenguangzhai Song CD , 1992 .

[16]  Sutirth Dey,et al.  Stability via Asynchrony in Drosophila Metapopulations with Low Migration Rates , 2006, Science.

[17]  Gonzalo Álvarez,et al.  Some Basic Cryptographic Requirements for Chaos-Based Cryptosystems , 2003, Int. J. Bifurc. Chaos.

[18]  Yuri Maistrenko,et al.  An introduction to the synchronization of chaotic systems: coupled skew tent maps , 1997 .

[19]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[20]  W. Ebeling,et al.  On grammars, complexity, and information measures of biological macromolecules , 1980 .

[21]  Khalid Sayood,et al.  Data Compression Concepts and Algorithms and Their Applications to Bioinformatics , 2009, Entropy.

[22]  V. Loreto,et al.  Data compression and learning in time sequences analysis , 2002, cond-mat/0207321.

[23]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[24]  Andrei N. Kolmogorov,et al.  Logical basis for information theory and probability theory , 1968, IEEE Trans. Inf. Theory.