Coverage-adjusted entropy estimation.

Data on 'neural coding' have frequently been analyzed using information-theoretic measures. These formulations involve the fundamental and generally difficult statistical problem of estimating entropy. We review briefly several methods that have been advanced to estimate entropy and highlight a method, the coverage-adjusted entropy estimator (CAE), due to Chao and Shen that appeared recently in the environmental statistics literature. This method begins with the elementary Horvitz-Thompson estimator, developed for sampling from a finite population, and adjusts for the potential new species that have not yet been observed in the sample-these become the new patterns or 'words' in a spike train that have not yet been observed. The adjustment is due to I. J. Good, and is called the Good-Turing coverage estimate. We provide a new empirical regularization derivation of the coverage-adjusted probability estimator, which shrinks the maximum likelihood estimate. We prove that the CAE is consistent and first-order optimal, with rate O(P)(1/log n), in the class of distributions with finite entropy variance and that, within the class of distributions with finite qth moment of the log-likelihood, the Good-Turing coverage estimate and the total probability of unobserved words converge at rate O(P)(1/(log n)(q)). We then provide a simulation study of the estimator with standard distributions and examples from neuronal data, where observations are dependent. The results show that, with a minor modification, the CAE performs much better than the MLE and is better than the best upper bound estimator, due to Paninski, when the number of possible words m is unknown or infinite.

[1]  W. McCulloch,et al.  The limiting information capacity of a neuronal link , 1952 .

[2]  William Bialek,et al.  Entropy and information in neural spike trains: progress on the sampling problem. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  G. Basharin On a Statistical Estimate for the Entropy of a Sequence of Independent Random Variables , 1959 .

[4]  Alexander Borst,et al.  Information theory and neural coding , 1999, Nature Neuroscience.

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  William Bialek,et al.  Spikes: Exploring the Neural Code , 1996 .

[7]  I. Goudie,et al.  Coverage-adjusted estimators for mark-recapture in heterogeneous populations , 2000 .

[8]  A. Chao,et al.  Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample , 2004, Environmental and Ecological Statistics.

[9]  Peter Bühlmann,et al.  Variable Length Markov Chains: Methodology, Computing, and Software , 2004 .

[10]  Ga Miller,et al.  Note on the bias of information estimates , 1955 .

[11]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[12]  S. Zahl,et al.  JACKKNIFING AN INDEX OF DIVERSITY , 1977 .

[13]  Alon Orlitsky,et al.  On Modeling Profiles Instead of Values , 2004, UAI.

[14]  Jonathan D. Victor,et al.  Asymptotic Bias in Information Estimates and the Exponential (Bell) Polynomials , 2000, Neural Computation.

[15]  William Bialek,et al.  Entropy and Information in Neural Spike Trains , 1996, cond-mat/9603127.

[16]  Sarah M. N. Woolley,et al.  Modulation Power and Phase Spectrum of Natural Sounds Enhance Neural Encoding Performed by Single Auditory Neurons , 2004, The Journal of Neuroscience.

[17]  T. H. Bullock,et al.  Neutral coding - A report based on an NRP work session , 1968 .

[18]  F. H. Adler Cybernetics, or Control and Communication in the Animal and the Machine. , 1949 .

[19]  David R. Wolf,et al.  Estimating functions of probability distributions from a finite set of samples. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[20]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[21]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[22]  Peter J. Bickel,et al.  On estimating the total probability of the unobserved outcomes of an experiment , 1986 .

[23]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[24]  David A. McAllester,et al.  On the Convergence Rate of Good-Turing Estimators , 2000, COLT.

[25]  A. Antos,et al.  Convergence properties of functional estimates for discrete distributions , 2001 .

[26]  Peter Bühlmann,et al.  Variable length Markov chains: methodology, computing and software , 2002 .

[27]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .