Dynamic non-Bayesian decision making in multi-agent systems

We consider a group of several non-Bayesian agents that can fully coordinate their activities and share their past experience in order to obtain a joint goal in face of uncertainty. The reward obtained by each agent is a function of the environment state but not of the action taken by other agents in the group. The environment state (controlled by Nature) may change arbitrarily, and the reward function is initially unknown. Two basic feedback structures are considered. In one of them — the perfect monitoring case — the agents are able to observe the previous environment state as part of their feedback, while in the other — the imperfect monitoring case — all that is available to the agents are the rewards obtained. Both of these settings refer to partially observable processes, where the current environment state is unknown. Our study refers to the competitive ratio criterion. It is shown that, for the imperfect monitoring case, there exists an efficient stochastic policy that ensures that the competitive ratio is obtained for all agents at almost all stages with an arbitrarily high probability, where efficiency is measured in terms of rate of convergence. It is also shown that if the agents are restricted only to deterministic policies then such a policy does not exist, even in the perfect monitoring case.

[1]  Ronen I. Brafman,et al.  On the Axiomatization of Qualitative Decision Criteria , 1997, AAAI/IAAI.

[2]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[3]  Jon Doyle,et al.  Modular utility representation for decision-theoretic planning , 1992 .

[4]  Mihalis Yannakakis,et al.  Shortest Paths Without a Map , 1989, Theor. Comput. Sci..

[5]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[6]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[7]  J. Harsanyi Games with Incomplete Information Played by “Bayesian” Players Part II. Bayesian Equilibrium Points , 1968 .

[8]  Michael P. Wellman REASONING ABOUT PREFERENCE MODELS , 1985 .

[9]  Moshe Tennenholtz,et al.  Dynamic Non-Bayesian Decision Making , 1997, J. Artif. Intell. Res..

[10]  Robert J. Aumann,et al.  Repeated Games with Incomplete Information , 1995 .

[11]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[12]  L. J. Savage,et al.  The Foundations of Statistics , 1955 .

[13]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[14]  Moshe Tennenholtz,et al.  Multi-entity Models , 1996, Machine Intelligence 14.

[15]  David M. Kreps,et al.  A Course in Microeconomic Theory , 2020 .

[16]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[17]  Microeconomics-Charles W. Upton Repeated games , 2020, Game Theory.

[18]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[19]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[20]  J. Harsanyi Games with Incomplete Information Played by 'Bayesian' Players, Part III. The Basic Probability Distribution of the Game , 1968 .

[21]  Howard Raiffa,et al.  Games and Decisions: Introduction and Critical Survey. , 1958 .

[22]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[23]  Sandip Sen IJCAI-95 Workshop on Adaptation and Learning in Multiagent Systems , 1996 .

[24]  J. Milnor Games Against Nature , 1951 .

[25]  L. J. Savage,et al.  The Foundations of Statistics , 1955 .

[26]  David M. Kreps Notes On The Theory Of Choice , 1988 .