论文信息 - Adaptive prior probabilities via optimization of risk and entropy

Adaptive prior probabilities via optimization of risk and entropy

An agent choosing between various actions tends to take the one with the lowest loss. But this choice is arguably too rigid (not adaptive) to be useful in complex situations, e.g. where exploration-exploitation trade-off is relevant, or in creative task solving. Here we study an agent that -- given a certain average utility invested into adaptation -- chooses his actions via probabilities obtained through optimizing the entropy. As we argue, entropy minimization corresponds to a risk-averse agent, whereas a risk-seeking agent will maximize the entropy. The entropy minimization can (under certain conditions) recover the epsilon-greedy probabilities known in reinforced learning. We show that the entropy minimization -- in contrast to its maximization -- leads to rudimentary forms of intelligent behavior: (i) the agent accounts for extreme events, especially when he did not invest much into adaptation. (ii) He chooses the action related to lesser loss (lesser of two evils) when confronted with two actions with comparable losses. (iii) The agent is subject to effects similar to cognitive dissonance and frustration. Neither of these features are shown by the risk-seeking agent whose probabilities are given by the maximum entropy. Mathematically, the difference between entropy maximization versus its minimization corresponds with maximizing a convex function (in a convex domain, i.e.convex programming) versus minimizing it (concave programming).

[1] János Aczél,et al. The Role of Some Functional Equations in Decision Analysis , 2010, Decis. Anal..

[2] János Aczél,et al. A Mixed Theory of Information. III. Inset Entropies of Degree Beta , 1978, Inf. Control..

[3] Angela J. Yu,et al. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[4] Rodney W. Johnson,et al. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , 1980, IEEE Trans. Inf. Theory.

[5] Lola L. Lopes,et al. [Advances in Experimental Social Psychology] Advances in Experimental Social Psychology Volume 20 Volume 20 || Between Hope and Fear: The Psychology of Risk , 1987 .

[6] Edwin T. Jaynes. Prior Probabilities , 2010, Encyclopedia of Machine Learning.

[7] Y. Tikochinsky,et al. Consistent inference of probabilities for reproducible experiments , 1984 .

[8] A. Luchins. Mechanization in problem solving: The effect of Einstellung. , 1942 .

[9] Jeff B. Paris,et al. On the applicability of maximum entropy to inexact reasoning , 1989, Int. J. Approx. Reason..

[10] E. T. Jaynes,et al. Violation of Boltzmann's H Theorem in Real Gases , 1971 .

[11] Jiping Yang,et al. Normalized Expected Utility-Entropy Measure of Risk , 2014, Entropy.

[12] Hu. Anti-H-theorem in Markov processes. , 1986, Physical review. A, General physics.

[13] C Truesdell,et al. On the Functional Equation. , 1947, Proceedings of the National Academy of Sciences of the United States of America.

[14] Wanhua Qiu,et al. A measure of risk and a decision-making model based on expected utility and entropy , 2005, Eur. J. Oper. Res..

[15] H. Levy. Stochastic dominance and expected utility: survey and analysis , 1992 .

[16] R. Duane Ireland,et al. Academy of management journal , 2011 .

[17] Eckehard Olbrich,et al. Hysteresis Effects of Changing Parameters of Noncooperative Games , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18] A. Lenard. Thermodynamical proof of the Gibbs formula for elementary quantum systems , 1978 .

[19] I. Olkin,et al. Inequalities: Theory of Majorization and Its Applications , 1980 .

[20] George A. Akerlof,et al. The Economic Consequences of Cognitive Dissonance , 1982 .

[21] S. Polgar,et al. Evolution and the thermodynamic imperative. , 1961, Human biology.

[22] Norman R. F. Maier,et al. AN ASPECT OF HUMAN REASONING , 1933 .

[23] Moshe Leshno,et al. Preferred by "All" and Preferred by "Most" Decision Makers: Almost Stochastic Dominance , 2002, Manag. Sci..

[24] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25] Michael Satosi Watanabe,et al. Information-Theoretical Aspects of Inductive and Deductive Inference , 1960, IBM J. Res. Dev..

[26] Wayne M. Saslow,et al. An economic analogy to thermodynamics , 1999 .

[27] A D Wissner-Gross,et al. Causal entropic forces. , 2013, Physical review letters.

[28] R. Christensen. ENTROPY MINIMAX MULTIVARIATE STATISTICAL MODELING–I: THEORY , 1985 .

[29] Roman Garnett,et al. The entropic basis of collective behaviour , 2014, Journal of The Royal Society Interface.

[30] Robert L. Fry,et al. Physical Intelligence and Thermodynamic Computing , 2017, Entropy.

[31] Robert J. Aumann,et al. An Economic Index of Riskiness , 2008, Journal of Political Economy.

[32] Gabriele Kern-Isberner. A note on conditional logics and entropy , 1998, Int. J. Approx. Reason..

[33] Jeff B. Paris,et al. A note on the inevitability of maximum entropy , 1990, Int. J. Approx. Reason..

[34] Daniel Hunter,et al. Causality and maximum entropy updating , 1989, Int. J. Approx. Reason..

[35] Armen E. Allahverdyan,et al. Opinion Dynamics with Confirmation Bias , 2014, PloS one.

[36] Kingshuk Ghosh,et al. Nonadditive entropies yield probability distributions with biases not warranted by the data. , 2013, Physical review letters.

[37] Ali E. Abbas,et al. Maximum Entropy Utility , 2004, Oper. Res..

[38] I. Csiszár,et al. MEASURING DISTRIBUTION MODEL RISK , 2016 .

[39] I. J. Good,et al. Some statistical methods in machine intelligence research , 1970 .

[40] Paul Skrzypczyk,et al. Most energetic passive states. , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41] Steven Durlauf,et al. How can statistical mechanics contribute to social science? , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[42] Esteban Induráin,et al. Utility and entropy , 2001 .

[43] János Aczél,et al. Utility of gambling I: entropy modified linear weighted utility , 2008 .

[44] David Banks,et al. Games and Decisions , 2015 .