Belief and Truth in Hypothesised Behaviours

There is a long history in game theory on the topic of Bayesian or "rational" learning, in which each player maintains beliefs over a set of alternative behaviours, or types, for the other players. This idea has gained increasing interest in the artificial intelligence (AI) community, where it is used as a method to control a single agent in a system composed of multiple agents with unknown behaviours. The idea is to hypothesise a set of types, each specifying a possible behaviour for the other agents, and to plan our own actions with respect to those types which we believe are most likely, given the observed actions of the agents. The game theory literature studies this idea primarily in the context of equilibrium attainment. In contrast, many AI applications have a focus on task completion and payoff maximisation. With this perspective in mind, we identify and address a spectrum of questions pertaining to belief and truth in hypothesised types. We formulate three basic ways to incorporate evidence into posterior beliefs and show when the resulting beliefs are correct, and when they may fail to be correct. Moreover, we demonstrate that prior beliefs can have a significant impact on our ability to maximise payoffs in the long-term, and that they can be computed automatically with consistent performance effects. Furthermore, we analyse the conditions under which we are able complete our task optimally, despite inaccuracies in the hypothesised types. Finally, we show how the correctness of hypothesised types can be ascertained during the interaction via an automated statistical analysis.

[1]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[2]  David R. Bellhouse A History of the Central Limit Theorem: From Classical to Modern Probability Theory. By Hans Fischer. Dordrecht, Heidelberg, London, New York (Springer). 2010. ISBN 978-0-387-87856-0. 418 pp. , 2012 .

[3]  John J. Nitao,et al.  Towards Applying Interactive POMDPs to Real-World Adversary Modeling , 2010, IAAI.

[4]  Bengt Jonsson,et al.  A logic for reasoning about time and reliability , 1990, Formal Aspects of Computing.

[5]  Vincent Conitzer,et al.  New complexity results about Nash equilibria , 2008, Games Econ. Behav..

[6]  Michael Wooldridge,et al.  Autonomous agents and multi-agent systems , 2014 .

[7]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[8]  Jacob W. Crandall,et al.  Towards Minimizing Disappointment in Repeated Games , 2014, J. Artif. Intell. Res..

[9]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[10]  H. Fischer A History of the Central Limit Theorem: From Classical to Modern Probability Theory , 2010 .

[11]  Kim G. Larsen,et al.  Bisimulation through Probabilistic Testing , 1991, Inf. Comput..

[12]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[13]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[14]  Prashant Doshi,et al.  Generalized Point Based Value Iteration for Interactive POMDPs , 2008, AAAI.

[15]  A. Azzalini A class of distributions which includes the normal ones , 1985 .

[16]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[17]  George E. P. Box,et al.  Sampling and Bayes' inference in scientific modelling and robustness , 1980 .

[18]  A. Tversky,et al.  Prospect theory: an analysis of decision under risk — Source link , 2007 .

[19]  Boris Ryabko,et al.  On hypotheses testing for ergodic processes , 2008, 2008 IEEE Information Theory Workshop.

[20]  Sarit Kraus,et al.  Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[21]  Subramanian Ramamoorthy,et al.  Are You Doing What I Think You Are Doing? Criticising Uncertain Agent Models , 2015, UAI.

[22]  J. I The Design of Experiments , 1936, Nature.

[23]  D. Fudenberg,et al.  Self-confirming equilibrium , 1993 .

[24]  John Nachbar Prediction, optimization, and learning in repeated games , 1997 .

[25]  J. Harsanyi Games with Incomplete Information Played by “Bayesian” Players Part II. Bayesian Equilibrium Points , 1968 .

[26]  Prashant Doshi,et al.  Modeling recursive reasoning by humans using empirically informed interactive POMDPs , 2010, AAMAS.

[27]  Sandra Carberry,et al.  Techniques for Plan Recognition , 2001, User Modeling and User-Adapted Interaction.

[28]  Sarit Kraus,et al.  Collaborative Plans for Complex Group Action , 1996, Artif. Intell..

[29]  Kousha Etessami,et al.  On the Complexity of Nash Equilibria and Other Fixed Points , 2010, SIAM J. Comput..

[30]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[31]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[32]  Sarit Kraus,et al.  Teamwork with Limited Knowledge of Teammates , 2013, AAAI.

[33]  J. Bernardo Reference Posterior Distributions for Bayesian Inference , 1979 .

[34]  Yaw Nyarko Bayesian learning and convergence to Nash equilibria without common priors , 1998 .

[35]  Prashant Doshi,et al.  On the Difficulty of Achieving Equilibrium in Interactive POMDPs , 2006, AI&M.

[36]  Yue Gao,et al.  Learning more powerful test statistics for click-based retrieval evaluation , 2010, SIGIR.

[37]  Edwin T. Jaynes Prior Probabilities , 2010, Encyclopedia of Machine Learning.

[38]  Robert P. Goldman,et al.  A probabilistic plan recognition algorithm based on plan tree grammars , 2009, Artif. Intell..

[39]  Gal A. Kaminka,et al.  Incorporating Observer Biases in Keyhole Plan Recognition (Efficiently!) , 2007, AAAI.

[40]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[41]  Subramanian Ramamoorthy,et al.  A Game-theoretic Model and Best-response Learning Method for Ad Hoc Coordination in Multiagent Systems (Extended Abstract) , 2013 .

[42]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[43]  Sarit Kraus,et al.  Empirical evaluation of ad hoc teamwork in the pursuit domain , 2011, AAMAS.

[44]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[45]  H. Peyton Young,et al.  Learning, hypothesis testing, and Nash equilibrium , 2003, Games Econ. Behav..

[46]  Gal A. Kaminka,et al.  Integration of Coordination Mechanisms in the BITE Multi-Robot Architecture , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[47]  J. Jordan,et al.  Bayesian learning in normal form games , 1991 .

[48]  Robert P. Goldman,et al.  A Bayesian Model of Plan Recognition , 1993, Artif. Intell..

[49]  Paul W. Goldberg,et al.  Autonomous Agents and Multiagent Systems , 2016, Lecture Notes in Computer Science.

[50]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[51]  J. Harsanyi Games with Incomplete Information Played by 'Bayesian' Players, Part III. The Basic Probability Distribution of the Game , 1968 .

[52]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[53]  Subramanian Ramamoorthy,et al.  Comparative evaluation of MAL algorithms in a diverse set of ad hoc team problems , 2012, AAMAS.

[54]  Drew Fudenberg,et al.  Learning to Play Bayesian Games , 2001, Games Econ. Behav..

[55]  Ya'akov Gal,et al.  Learning Social Preferences in Games , 2004, AAAI.

[56]  Ronald A. Howard,et al.  Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..

[57]  Subramanian Ramamoorthy,et al.  On Convergence and Optimality of Best-Response Learning with Policy Types in Multiagent Systems , 2014, UAI.

[58]  M. J. Bayarri,et al.  P Values for Composite Null Models , 2000 .

[59]  Ingrid Zukerman,et al.  Bayesian Models for Keyhole Plan Recognition in an Adventure Game , 2004, User Modeling and User-Adapted Interaction.

[60]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[61]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[62]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[63]  Michael H. Bowling,et al.  Coordination and Adaptation in Impromptu Teams , 2005, AAAI.

[64]  Milind Tambe,et al.  Towards Flexible Teamwork , 1997, J. Artif. Intell. Res..

[65]  John Nachbar,et al.  Beliefs in Repeated Games , 2003 .

[66]  Jacob W. Crandall,et al.  Robust Learning for Repeated Stochastic Games via Meta-Gaming , 2014, IJCAI.

[67]  Yifeng Zeng,et al.  Graphical models for interactive POMDPs: representations and solutions , 2009, Autonomous Agents and Multi-Agent Systems.

[68]  Prashant Doshi,et al.  Monte Carlo Sampling Methods for Approximating Interactive POMDPs , 2014, J. Artif. Intell. Res..

[69]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[70]  Cosma Rohilla Shalizi,et al.  Philosophy and the practice of Bayesian statistics. , 2010, The British journal of mathematical and statistical psychology.

[71]  Christel Baier,et al.  Polynomial Time Algorithms for Testing Probabilistic Bisimulation and Simulation , 1996, CAV.

[72]  Stefano V. Albrecht Utilising policy types for effective ad hoc coordination in multiagent systems , 2015 .

[73]  P. J. Gmytrasiewicz,et al.  A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[74]  David Carmel,et al.  Exploration Strategies for Model-based Learning in Multi-agent Systems: Exploration Strategies , 1999, Autonomous Agents and Multi-Agent Systems.

[75]  François Charpillet,et al.  Producing efficient error-bounded solutions for transition independent decentralized mdps , 2013, AAMAS.

[76]  Jacob W. Crandall,et al.  An Empirical Study on the Practical Impact of Prior Beliefs over Policy Types , 2015, AAAI.

[77]  Jeff G. Schneider,et al.  Approximate solutions for partially observable stochastic games with common payoffs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[78]  John C. Harsanyi,et al.  Games with Incomplete Information Played by "Bayesian" Players, I-III: Part I. The Basic Model& , 2004, Manag. Sci..

[79]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[80]  A. Tversky,et al.  Prospect theory: analysis of decision under risk , 1979 .

[81]  H P Young,et al.  On the impossibility of predicting the behavior of rational agents , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[82]  H. W. Kuhn,et al.  11. Extensive Games and the Problem of Information , 1953 .

[83]  John E. Laird,et al.  Learning to play , 2009 .

[84]  A. O'Hagan,et al.  Bayes estimation subject to uncertainty about parameter constraints , 1976 .

[85]  Xiao-Li Meng,et al.  Posterior Predictive $p$-Values , 1994 .

[86]  Craig Boutilier,et al.  Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.

[87]  Robert P. Goldman,et al.  Plan, Activity, and Intent Recognition: Theory and Practice , 2014 .

[88]  David Andre,et al.  Model based Bayesian Exploration , 1999, UAI.

[89]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[90]  Subramanian Ramamoorthy,et al.  A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems , 2013, AAMAS.

[91]  J. Berger,et al.  Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence , 1987 .