Estimating the Entropy of Binary Time Series: Methodology, Some Theory and a Simulation Study

Abstract: Partly motivated by entropy-estimation problems in neuroscience, we present adetailed and extensive comparison between some of the most popular and effective entropyestimation methods used in practice: The plug-in method, four different estimators basedon the Lempel-Ziv (LZ) family of data compression algorithms, an estimator based on theContext-Tree Weighting (CTW) method, and the renewal entropy estimator.M ETHODOLOGY : Three new entropy estimators are introduced; two new LZ-basedestimators, and the “renewal entropy estimator,” which is tailored to data generated by abinary renewal process. For two of the four LZ-based estimators, a bootstrap procedure isdescribed for evaluating their standard error, and a practical rule of thumb is heuristicallyderived for selecting the values of their parameters in practice. T HEORY : We prove that,unlike their earlier versions, the two new LZ-based estimators are universally consistent,that is, they converge to the entropy rate for every finite-valued, stationary and ergodicprocess. An effective method is derived for the accurate approximation of the entropy rateof a finite-state hidden Markov model (HMM) with known distribution. Heuristiccalculations are presented and approximate formulas are derived for evaluating the bias andthe standard error of each estimator. S

[1]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[2]  Benoist,et al.  On the Entropy of DNA: Algorithms and Measurements based on Memory and Rapid Convergence , 1994 .

[3]  David Loewenstern,et al.  Significantly lower entropy estimates for natural DNA sequences , 1997, Proceedings DCC '97. Data Compression Conference.

[4]  A. Antos,et al.  Convergence properties of functional estimates for discrete distributions , 2001 .

[5]  Aaron D. Wyner,et al.  Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression , 1989, IEEE Trans. Inf. Theory.

[6]  Yuri M. Suhov,et al.  Nonparametric Entropy Estimation for Stationary Processesand Random Fields, with Applications to English Text , 1998, IEEE Trans. Inf. Theory.

[7]  Steven W. McLaughlin,et al.  On the Role of Pattern Matching in Information Theory , 2000 .

[8]  B. Pittel Asymptotical Growth of a Class of Random Trees , 1985 .

[9]  Joseph P. Romano,et al.  The stationary bootstrap , 1994 .

[10]  P. Shields Entropy and Prefixes , 1992 .

[11]  Peter Grassberger,et al.  Entropy estimation of symbol sequences. , 1996, Chaos.

[12]  John C. Kieffer,et al.  Sample converses in source coding theory , 1991, IEEE Trans. Inf. Theory.

[13]  Ioannis Kontoyiannis Second-order noiseless source coding theorems , 1997, IEEE Trans. Inf. Theory.

[14]  Idan Segev,et al.  The information efficacy of a synapse , 2002, Nature Neuroscience.

[15]  Liam Paninski,et al.  Estimating entropy on m bins given fewer than m samples , 2004, IEEE Transactions on Information Theory.

[16]  Ga Miller,et al.  Note on the bias of information estimates , 1955 .

[17]  PaninskiLiam Estimation of entropy and mutual information , 2003 .

[18]  Stefano Panzeri,et al.  The Upward Bias in Measures of Information Derived from Limited Data Samples , 1995, Neural Computation.

[19]  Robert L. Mercer,et al.  An Estimate of an Upper Bound for the Entropy of English , 1992, CL.

[20]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[21]  Yun Gao,et al.  From the Entropy to the Statistical Structure of Spike Trains , 2006, 2006 IEEE International Symposium on Information Theory.

[22]  Jonathan D. Victor,et al.  Asymptotic Bias in Information Estimates and the Exponential (Bell) Polynomials , 2000, Neural Computation.

[23]  Wojciech Szpankowski,et al.  A Generalized Suffix Tree and its (Un)expected Asymptotic Behaviors , 1993, SIAM J. Comput..

[24]  Benjamin Weiss,et al.  Entropy and data compression schemes , 1993, IEEE Trans. Inf. Theory.

[25]  John G. Cleary,et al.  The entropy of English using PPM-based models , 1996, Proceedings of Data Compression Conference - DCC '96.

[26]  Frans M. J. Willems,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[27]  Wojciech Szpankowski,et al.  Asymptotic properties of data compression and suffix trees , 1993, IEEE Trans. Inf. Theory.

[28]  Frans M. J. Willems,et al.  Context weighting for general finite-context sources , 1996, IEEE Trans. Inf. Theory.

[29]  Ioannis Kontoyiannis The complexity and entropy of literary styles , 1997 .

[30]  Jonathon Shlens,et al.  Estimating Information Rates with Confidence Intervals in Neural Spike Trains , 2007, Neural Computation.

[31]  Claude Shannon Information theory in the brain , 2000 .

[32]  Pamela Reinagel,et al.  Decoding visual information from a population of retinal ganglion cells. , 1997, Journal of neurophysiology.

[33]  Peter Grassberger,et al.  Estimating the information content of symbol sequences and efficient codes , 1989, IEEE Trans. Inf. Theory.

[34]  I. Ibragimov,et al.  Some Limit Theorems for Stationary Processes , 1962 .

[35]  G. Basharin On a Statistical Estimate for the Entropy of a Sequence of Independent Random Variables , 1959 .

[36]  Neri Merhav,et al.  Hidden Markov processes , 2002, IEEE Trans. Inf. Theory.

[37]  Serap A. Savari,et al.  On the entropy of DNA: algorithms and measurements based on memory and rapid convergence , 1995, SODA '95.

[38]  William Bialek,et al.  Entropy and information in neural spike trains: progress on the sampling problem. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[40]  F. Papangelou,et al.  On the entropy rate of stationary point processes and its discrete approximation , 1978 .

[41]  Aaron D. Wyner,et al.  Improved redundancy of a version of the Lempel-Ziv algorithm , 1995, IEEE Trans. Inf. Theory.

[42]  V. Parmon,et al.  Entropy and Information , 2009 .

[43]  Mark Levene,et al.  Computing the Entropy of User Navigation in the Web , 2003, Int. J. Inf. Technol. Decis. Mak..

[44]  A. Mees,et al.  Context-tree modeling of observed symbolic dynamics. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[45]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[46]  José María Amigó,et al.  Estimating the Entropy Rate of Spike Trains via Lempel-Ziv Complexity , 2004, Neural Computation.

[47]  P. Shields The Ergodic Theory of Discrete Sample Paths , 1996 .

[48]  John H. Reif,et al.  Using difficulty of prediction to decrease computation: fast sort, priority queue and convex hull on entropy bounded inputs , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[49]  Anthony M. Zador,et al.  Information through a Spiking Neuron , 1995, NIPS.

[50]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[51]  Ryuji Suzuki,et al.  Information entropy of humpback whale songs. , 1999, The Journal of the Acoustical Society of America.

[52]  William Bialek,et al.  Entropy and Information in Neural Spike Trains , 1996, cond-mat/9603127.

[53]  Ioannis Kontoyiannis,et al.  Prefixes and the entropy rate for long-range sources , 1994, Proceedings of 1994 IEEE International Symposium on Information Theory.

[54]  Philippe Jacquet,et al.  On the entropy of a hidden Markov process , 2004, Data Compression Conference, 2004. Proceedings. DCC 2004.

[55]  Gary S Bhumbra,et al.  Measuring spike coding in the rat supraoptic nucleus , 2004, The Journal of physiology.

[56]  Aaron D. Wyner,et al.  On the Role of Pattern Matching in Information Theory , 1998, IEEE Trans. Inf. Theory.

[57]  Frans M. J. Willems,et al.  The Context-Tree Weighting Method : Extensions , 1998, IEEE Trans. Inf. Theory.

[58]  Sanjeev R. Kulkarni,et al.  Universal entropy estimation via block sorting , 2004, IEEE Transactions on Information Theory.

[59]  P.A.J. Volf,et al.  On the context tree maximizing algorithm , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[60]  Igor Vajda,et al.  Estimation of the Information by an Adaptive Partitioning of the Observation Space , 1999, IEEE Trans. Inf. Theory.