论文信息 - Information Loss Due to the Data Reduction of Sample Data from Discrete Distributions

Information Loss Due to the Data Reduction of Sample Data from Discrete Distributions

In this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s value for this data set. We focus on sufficient statistics for the parameter of interest and develop a general formula independent of the parameter for the Shannon information lost when a data sample is reduced to such a summary statistic. We also develop a measure of entropy for this lost information that depends only on the real-valued statistic but neither the parameter nor the data. Our approach would also work for non-sufficient statistics, but the lost information and associated entropy would involve the parameter. The method is applied to three well-known discrete distributions to illustrate its implementation.

Maryam Moghimi | Herbert W. Corley | Maryam Moghimi | H. Corley

[1] Ronaldo Vigo,et al. Representational information: a new general notion and measure of information , 2011, Information Sciences.

[2] Veronica J. Vieland,et al. Information Loss in Binomial Data Due to Data Compression , 2017, Entropy.

[3] Ronaldo Vigo,et al. Complexity over Uncertainty in Generalized Representational Information Theory (GRIT): A Structure-Sensitive General Theory of Information , 2012, Inf..

[4] Imre Csiszár,et al. Axiomatic Characterizations of Information Measures , 2008, Entropy.

[5] C. E. SHANNON,et al. A mathematical theory of communication , 1948, MOCO.

[6] R. Duncan Luce,et al. Whatever Happened to Information Theory in Psychology? , 2003 .

[7] Peter Fedor,et al. A tribute to Claude Shannon (1916-2001) and a plea for more rigorous use of species richness, species diversity and the 'Shannon-Wiener' Index , 2003 .

[8] Cristinel Mortici,et al. Ramanujan formula for the generalized Stirling approximation , 2010, Appl. Math. Comput..

[9] Travis S. Humble,et al. Quantum supremacy using a programmable superconducting processor , 2019, Nature.

[10] Ohad Shamir,et al. Learning and generalization with the information bottleneck , 2008, Theoretical Computer Science.