Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory

We describe and develop a close relationship between two problems that have customarily been regarded as distinct: that of maximizing entropy, and that of minimizing worst-case expected loss. Using a formulation grounded in the equilibrium theory of zero-sum games between Decision Maker and Nature, these two problems are shown to be dual to each other, the solution to each providing that to the other. Although Topsoe described this connection for the Shannon entropy over 20 years ago, it does not appear to be widely known even in that important special case. We here generalize this theory to apply to arbitrary decision problems and loss functions. We indicate how an appropriate generalized definition of entropy can be associated with such a problem, and we show that, subject to certain regularity conditions, the above-mentioned duality continues to apply in this extended context. This simultaneously provides a possible rationale for maximizing entropy and a tool for finding robust Bayes acts. We also describe the essential identity between the problem of maximizing entropy and that of minimizing a related discrepancy or divergence between distributions. This leads to an extension, to arbitrary discrepancies, of a well-known minimax theorem for the case of Kullback-Leibler divergence (the redundancy-capacity theorem of information theory). For the important case of families of distributions having certain mean values specified, we develop simple sufficient conditions and methods for identifying the desired solutions. We use this theory to introduce a new concept of generalized exponential family linked to the specific decision problem under consideration, and we demonstrate that this shares many of the properties of standard exponential families. Finally, we show that the existence of an equilibrium in our game can be rephrased in terms of a Pythagorean property of the related divergence, thus generalizing previously announced results for Kullback-Leibler and Bregman divergences.

[1]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[2]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[3]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[4]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[5]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[6]  A. Rényi On Measures of Entropy and Information , 1961 .

[7]  M. Degroot Uncertainty, Information, and Sequential Experiments , 1962 .

[8]  Amiel Feinstein,et al.  Information and information stability of random variables and processes , 1964 .

[9]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[10]  Gerald S. Rogers,et al.  Mathematical Statistics: A Decision Theoretic Approach , 1967 .

[11]  J. Stoer,et al.  Convexity and Optimization in Finite Dimensions I , 1970 .

[12]  M. Degroot Optimal Statistical Decisions , 1970 .

[13]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[14]  Edward C. Posner,et al.  Random coding strategies for minimum entropy , 1975, IEEE Trans. Inf. Theory.

[15]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[16]  J. Bernardo Reference Posterior Distributions for Bayesian Inference , 1979 .

[17]  Flemming Topsøe,et al.  Information-theoretical optimization techniques , 1979, Kybernetika.

[18]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[19]  Alberto Leon-Garcia,et al.  A source matching approach to finding minimax codes , 1980, IEEE Trans. Inf. Theory.

[20]  Van Fraassen,et al.  A PROBLEM FOR RELATIVE INFORMATION MINIMIZERS IN PROBABILITY KINEMATICS , 1981 .

[21]  Calyampudi R. Rao Diversity and dissimilarity coefficients: A unified approach☆ , 1982 .

[22]  E. T. Jaynes,et al.  Papers on probability, statistics and statistical physics , 1983 .

[23]  Teddy Seidenfeld Entropy and Uncertainty , 1986, Philosophy of Science.

[24]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[25]  Andrew R. Barron,et al.  Information-theoretic asymptotics of Bayes methods , 1990, IEEE Trans. Inf. Theory.

[26]  Charles L. Byrne,et al.  General entropy criteria for inverse problems, with applications to data compression, pattern classification, and cluster analysis , 1990, IEEE Trans. Inf. Theory.

[27]  P. Walley Statistical Reasoning with Imprecise Probabilities , 1990 .

[28]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[29]  I. Csiszár Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .

[30]  A. Barron,et al.  Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[31]  J. N. Kapur,et al.  Entropy optimization principles with applications , 1992 .

[32]  D. Stroock,et al.  Probability Theory: An Analytic View , 1995, The Mathematical Gazette.

[33]  Abner Shimony,et al.  Search For A Naturalistic World View , 1993 .

[34]  Jos Uffink,et al.  Can the maximum entropy principle be explained as a consistency requirement , 1995 .

[35]  Neri Merhav,et al.  A strong version of the redundancy-capacity theorem of universal coding , 1995, IEEE Trans. Inf. Theory.

[36]  Jos Uffink,et al.  The constraint rule of the maximum entropy principle , 1996 .

[37]  Jonathan M. Borwein,et al.  Maximum Entropy Reconstruction Using Derivative Information, Part 1: Fisher Information and Convex Duality , 1996, Math. Oper. Res..

[38]  R. Wiedenbrueck A Minimax Result for the Kullback Leibler Bayes Risk , 1997 .

[39]  David Haussler,et al.  A general minimax result for relative entropy , 1997, IEEE Trans. Inf. Theory.

[40]  Y. Censor,et al.  Parallel Optimization: Theory, Algorithms, and Applications , 1997 .

[41]  Jürgen Krob,et al.  A Minimax Result for the Kullback Leibler Bayes Risk , 1997 .

[42]  Jos Uunk,et al.  Can the Maximum Entropy Principle Be Explained as a Consistency Requirement? , 1997 .

[43]  P. Grünwald The Minimum Description Length Principle and Reasoning under Uncertainty , 1998 .

[44]  Peter Gr Unwald The minimum description length principle and reasoning under uncertainty , 1998 .

[45]  H. Scholl Shannon-Optimal Priors on iid Statistical Experiments Converge Weakly to Jeffreys Prior , 1999 .

[46]  J. Lafferty Additive models, boosting, and inference for generalized divergences , 1999, COLT '99.

[47]  Manfred K. Warmuth,et al.  Boosting as entropy projection , 1999, COLT '99.

[48]  Paola Sebastiani,et al.  Coherent dispersion criteria for optimal experimental design , 1999 .

[49]  Andrew R. Barron,et al.  Asymptotic minimax regret for data compression, gambling, and prediction , 1997, IEEE Trans. Inf. Theory.

[50]  David Ríos Insua,et al.  Robust Bayesian analysis , 2000 .

[51]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Parallel Distributed Comput. Pract..

[52]  B. Vidakovic Γ-Minimax: A Paradigm for Conservative Robust Bayesians , 2000 .

[53]  S. D. Pietra,et al.  Duality and Auxiliary Functions for Bregman Distances , 2001 .

[54]  Flemming Topsøe,et al.  Basic Concepts, Identities and Inequalities - the Toolkit of Information Theory , 2001, Entropy.

[55]  Peter Harremoës,et al.  Maximum Entropy Fundamentals , 2001, Entropy.

[56]  Wilfried Seidel,et al.  An algorithm for calculating Γ-minimax decision rules under generalized moment conditions , 2001 .

[57]  F. Topsøe,et al.  Unified approach to optimization techniques in Shannon theory , 2002, Proceedings IEEE International Symposium on Information Theory,.

[58]  A. Philip Dawid,et al.  Game theory, maximum generalized entropy, minimum discrepancy, robust Bayes and Pythagoras , 2002, Proceedings of the IEEE Information Theory Workshop.

[59]  Flemming Topsøe,et al.  Maximum entropy versus minimum risk and applications to some classical discrete distributions , 2002, IEEE Trans. Inf. Theory.

[60]  N. Bingham Probability Theory: An Analytic View , 2002 .

[61]  A. Shimony The status of the principle of maximum entropy , 1985, Synthese.

[62]  E. T. Jaynes,et al.  Some random observations , 1985, Synthese.

[63]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[64]  Brian Skyrms,et al.  Maximum entropy inference as a special case of conditionalization , 1985, Synthese.