Shannon entropy: a rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics

Statistical entropy was introduced by Shannon as a basic concept in information theory measuring the average missing information in a random source. Extended into an entropy rate, it gives bounds in coding and compression theorems. In this paper, I describe how statistical entropy and entropy rate relate to other notions of entropy that are relevant to probability theory (entropy of a discrete probability distribution measuring its unevenness), computer sciences (algorithmic complexity), the ergodic theory of dynamical systems (Kolmogorov–Sinai or metric entropy) and statistical physics (Boltzmann entropy). Their mathematical foundations and correlates (the entropy concentration, Sanov, Shannon–McMillan–Breiman, Lempel–Ziv and Pesin theorems) clarify their interpretation and offer a rigorous basis for maximum entropy principles. Although often ignored, these mathematical perspectives give a central position to entropy and relative entropy in statistical laws describing generic collective behaviours, and provide insights into the notions of randomness, typicality and disorder. The relevance of entropy beyond the realm of physics, in particular for living systems and ecosystems, is yet to be demonstrated.

[1]  Giovanni Gallavotti,et al.  Chaotic dynamics, fluctuations, nonequilibrium ensembles. , 1997, Chaos.

[2]  J. Crutchfield,et al.  Measures of statistical complexity: Why? , 1998 .

[3]  B. McMillan The Basic Theorems of Information Theory , 1953 .

[4]  J. Bricmont SCIENCE OF CHAOS OR CHAOS IN SCIENCE? , 1995, chao-dyn/9603009.

[5]  E. T. Jaynes,et al.  Where do we Stand on Maximum Entropy , 1979 .

[6]  Vladimir Vovk,et al.  Kolmogorov's Contributions to the Foundations of Probability , 2003, Probl. Inf. Transm..

[7]  L. Brillouin,et al.  The Negentropy Principle of Information , 1953 .

[8]  A. Si,et al.  Entropy,Large Deviations,and Statistical Mechanics , 2011 .

[9]  Wolfgang Krieger,et al.  On entropy and generators of measure-preserving transformations , 1970 .

[10]  Seth Lloyd,et al.  Information measures, effective complexity, and total information , 1996 .

[11]  Michael Leyton,et al.  A generative theory of shape , 2004, Proceedings Shape Modeling Applications, 2004..

[12]  James J. Kay,et al.  Self-Organization In Living Systems , 2006 .

[13]  Pierre Gaspard,et al.  Erratum: Time-Reversed Dynamical Entropy and Irreversibility in Markovian Random Processes , 2004 .

[14]  S. Frank The common patterns of nature , 2009, Journal of evolutionary biology.

[15]  H. Touchette The large deviation approach to statistical mechanics , 2008, 0804.0327.

[16]  D. Gillies Philosophical Theories of Probability , 2000 .

[17]  Hung T. Nguyen,et al.  A course in stochastic processes , 1996 .

[18]  Debra J. Searles,et al.  The Fluctuation Theorem , 2002 .

[19]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[20]  Jan M. Van Campenhout,et al.  Maximum entropy and conditional probability , 1981, IEEE Trans. Inf. Theory.

[21]  David McMahon A Brief Introduction to Information Theory , 2008 .

[22]  F. Ledrappier,et al.  A proof of the estimation from below in Pesin's entropy formula , 1982, Ergodic Theory and Dynamical Systems.

[23]  E. Jaynes On the rationale of maximum-entropy methods , 1982, Proceedings of the IEEE.

[24]  Ray J. Solomonoff,et al.  Complexity-based induction systems: Comparisons and convergence theorems , 1978, IEEE Trans. Inf. Theory.

[25]  H. Kantz,et al.  Nonlinear time series analysis , 1997 .

[26]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[27]  P. Landsberg,et al.  Simple measure for complexity , 1999 .

[28]  A. Sokal,et al.  Exponential convergence to equilibrium for a class of random-walk models , 1989 .

[29]  Eli Glasner,et al.  Ergodic Theory via Joinings , 2003 .

[30]  P. Grassberger Toward a quantitative theory of self-generated complexity , 1986 .

[31]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[32]  Nicolas Schmidt,et al.  Quantifying Neural Correlations Using Lempel-Ziv Complexity , 2008 .

[33]  Angelo Vulpiani,et al.  Chaos and Coarse Graining in Statistical Mechanics , 2008 .

[34]  J. Ord,et al.  Characterization Problems in Mathematical Statistics , 1975 .

[35]  Rudolf Clausius,et al.  The Mechanical Theory of Heat: With Its Applications to the Steam-Engine and to the Physical Properties of Bodies , 2015 .

[36]  D. Chandler,et al.  Introduction To Modern Statistical Mechanics , 1987 .

[37]  A. Sokal Monte Carlo Methods in Statistical Mechanics: Foundations and New Algorithms , 1997 .

[38]  R. Balian Entropy, a Protean Concept , 2004 .

[39]  L. Pezard,et al.  Entropy estimation of very short symbolic sequences. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[40]  David Ruelle,et al.  Ludwig Boltzmann. The Man Who Trusted Atoms , 1998 .

[41]  Wolfgang Krieger On unique ergodicity , 1972 .

[42]  Pierre Gaspard,et al.  Toward a probabilistic approach to complex systems , 1994 .

[43]  Y. Pesin,et al.  Dimension theory in dynamical systems , 1997 .

[44]  Calyampudi R. Rao,et al.  Characterization Problems in Mathematical Statistics , 1976 .

[45]  L. Szilard über die Entropieverminderung in einem thermodynamischen System bei Eingriffen intelligenter Wesen , 1929 .

[46]  Lawrence S. Schulman,et al.  We know why coffee cools , 2010 .

[47]  T. Cover,et al.  A sandwich proof of the Shannon-McMillan-Breiman theorem , 1988 .

[48]  A. Maritan,et al.  Applications of the principle of maximum entropy: from physics to ecology , 2010, Journal of physics. Condensed matter : an Institute of Physics journal.

[49]  A. Lesne,et al.  On the Second Law of Thermodynamics and the Piston Problem , 2004 .

[50]  Kevin Ford,et al.  From Kolmogorov’s theorem on empirical distribution to number theory , 2007 .

[51]  Aaron D. Wyner,et al.  Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression , 1989, IEEE Trans. Inf. Theory.

[52]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[53]  E. T. Jaynes,et al.  Papers on probability, statistics and statistical physics , 1983 .

[54]  Jaroslav Šesták Chapter 7 – THERMODYNAMICS AND THERMOSTATICS , 2005 .

[55]  Tomohiko Yamaguchi,et al.  Entropy balance in distributed reversible Gray–Scott model , 2010 .

[56]  Joel L. Lebowitz,et al.  Macroscopic laws, microscopic dynamics, time's arrow and Boltzmann's entropy , 1993 .

[57]  H. White Algorithmic complexity of points in dynamical systems , 1993, Ergodic Theory and Dynamical Systems.

[58]  Hans-Otto Georgii Chapter Three. Probabilistic Aspects of Entropy , 2003 .

[59]  Wojciech Hubert Zurek,et al.  Maxwell’s Demon, Szilard’s Engine and Quantum Measurements , 2003, quant-ph/0301076.

[60]  Giorgio Parisi Complexity and Intelligence , 2003 .

[61]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[62]  L. Brillouin,et al.  Physical Entropy and Information. II , 1951 .

[63]  L. Pezard,et al.  Delay independence of mutual-information rate of two symbolic sequences. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[64]  John Scales Avery,et al.  Information theory and evolution , 2003 .

[65]  E. Cohen,et al.  Note on Two Theorems in Nonequilibrium Statistical Mechanics , 1999, cond-mat/9903418.

[66]  A. Vulpiani,et al.  Kolmogorov’s Legacy about Entropy, Chaos, and Complexity , 2003 .

[67]  Ricardo Fraiman,et al.  Limit theorems for sequences of random trees , 2004 .

[68]  Samuel Karlin,et al.  A First Course on Stochastic Processes , 1968 .

[69]  Rampal S Etienne,et al.  Entropy Maximization and the Spatial Distribution of Species , 2010, The American Naturalist.

[70]  H. Georgii Probabilistic aspects of entropy , 2000 .

[71]  R. Badii,et al.  Complexity: Hierarchical Structures and Scaling in Physics , 1997 .

[72]  Imre Csiszár,et al.  Information Theory - Coding Theorems for Discrete Memoryless Systems, Second Edition , 2011 .

[73]  Alexander N Gorban,et al.  Quasi-equilibrium closure hierarchies for the Boltzmann equation , 2003, cond-mat/0305599.

[74]  Eytan Domany,et al.  The Entropy of a Binary Hidden Markov Process , 2005, ArXiv.

[75]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[76]  Robert M. Gray,et al.  Entropy and Information , 1990 .

[77]  Anatole Katok,et al.  On local entropy , 1983 .

[78]  David P Ruelle Extending the definition of entropy to nonequilibrium steady states , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[79]  L. Brillouin,et al.  Science and information theory , 1956 .

[80]  M. Tribus,et al.  Energy and information , 1971 .

[81]  Jean-Louis Dessalles A structural model of intuitive probability , 2011, ArXiv.

[82]  Grégoire Nicolis,et al.  Self-Organization in nonequilibrium systems , 1977 .

[83]  Ilya Prigogine,et al.  Thermodynamics of Irreversible Processes , 2018, Principles of Thermodynamics.

[84]  Inés Samengo Estimating probabilities from experimental frequencies. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[85]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[86]  A. Einstein,et al.  Theorie der Opaleszenz von homogenen Flüssigkeiten und Flüssigkeitsgemischen in der Nähe des kritischen Zustandes [AdP 33, 1275 (1910)] , 2005, Annalen der Physik.

[87]  Fady Alajaji,et al.  Rényi's divergence and entropy rates for finite alphabet Markov sources , 2001, IEEE Trans. Inf. Theory.

[88]  L. Breiman The Individual Ergodic Theorem of Information Theory , 1957 .

[89]  Imre Csisźar,et al.  The Method of Types , 1998, IEEE Trans. Inf. Theory.

[90]  E. Jaynes The well-posed problem , 1973 .

[91]  I. N. Sanov On the probability of large deviations of random variables , 1958 .

[92]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[93]  Joel L. Lebowitz,et al.  Boltzmann's Entropy and Time's Arrow , 1993 .

[94]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[95]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[96]  E. Schrödinger What is life? : the physical aspect of the living cell , 1944 .

[97]  R. Balian,et al.  Information in statistical physics , 2005, cond-mat/0501322.

[98]  L. Brillouin Maxwell's Demon Cannot Operate: Information and Entropy. I , 1951 .

[99]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[100]  Per Martin-Löf,et al.  The Definition of Random Sequences , 1966, Inf. Control..

[101]  A. Wehrl General properties of entropy , 1978 .

[102]  S. Ikeda On characterization of the kullback-leibler mean information for continuous probability distributions , 1962 .

[103]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[104]  R. T. Cox Probability, frequency and reasonable expectation , 1990 .

[105]  Masahito Ueda,et al.  Minimal energy cost for thermodynamic information processing: measurement and information erasure. , 2008, Physical review letters.

[106]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1993, Graduate Texts in Computer Science.

[108]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[109]  Claudine Robert AN ENTROPY CONCENTRATION THEOREM: APPLICATIONS IN ARTIFICIAL INTELLIGENCE AND DESCRIPTIVE STATISTICS , 1990 .

[110]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[111]  Giovanni Gallavotti,et al.  Entropy, thermostats, and chaotic hypothesis. , 2006, Chaos.

[112]  P. Levy Processus stochastiques et mouvement brownien , 1948 .

[113]  E. Jaynes The Minimum Entropy Production Principle , 1980 .

[114]  Shunsuke Ihara,et al.  Information theory - for continuous systems , 1993 .

[115]  D. A. Bell Physical Entropy and Information , 1952 .

[116]  C. Cercignani The Boltzmann equation and its applications , 1988 .

[117]  A. Lesne,et al.  Feature context-dependency and complexity-reduction in probability landscapes for integrative genomics , 2008, Theoretical Biology and Medical Modelling.

[118]  Annick Lesne,et al.  The discrete versus continuous controversy in physics , 2007, Mathematical Structures in Computer Science.

[119]  R. Landauer,et al.  Irreversibility and heat generation in the computing process , 1961, IBM J. Res. Dev..