A Bayesian approach to multiagent reinforcement learning and coalition formation under uncertainty

Sequential decision making under uncertainty is always a challenge for autonomous agents populating a multiagent environment, since their behaviour is inevitably influenced by the behaviour of others. Further, agents have to constantly struggle to find the right balance between exploiting current information regarding the environment and the rest of its inhabitants, and exploring so that they acquire additional information. Moreover, they need to profitably trade off short-term rewards with anticipated long-term ones, while learning through interaction about the environment and others—employing techniques from reinforcement learning (RL), a fundamental area of study within artificial intelligence (AI). Coalition formation is a problem of great interest within game theory and AI, allowing autonomous individually rational agents to form stable or transient teams (or coalitions) to tackle an underlying task. Agents participating in realistic scenarios of repeated coalition formation under uncertainty face the issues identified above, and need to bargain to succesfully negotiate the terms of their participation in coalitions—often having to compromise individual with team welfare effectively. In this thesis, we provide theoretical and algorithmic tools to accommodate sequential decision making under uncertainty in multiagent settings, dealing with the issues above. Specifically, we combine multiagent Bayesian RL with game theoretic ideas to facilitate the agents' sequential decision making. We deal with popular multiagent problems which were to date not tackled under uncertainty, or more specifically under type uncertainty. In our work, we assume that the environment dynamics or the types (capabilities) of other agents are not known, and thus the agents have to account for this uncertainty, in a Bayesian way, when making decisions. Handling type uncertainty allows information about others acquired within one setting to be exploited in possibly different settings in the future. The core of our contributions lies in the area of coalition formation under uncertainty. We studied several aspects of both the cooperative and non-cooperative facets of this problem, coining new theoretical concepts, proving theoretical results, presenting and evaluating algorithms for use in this context, and proposing a Bayesian RL framework for optimal repeated coalition formation under uncertainty.

[1]  L. Shapley,et al.  Fictitious Play Property for Games with Identical Interests , 1996 .

[2]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[3]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[4]  Anja De Waegenaere,et al.  Cooperative games with stochastic payoffs , 1999, Eur. J. Oper. Res..

[5]  P. Poupart Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .

[6]  Peter Marbach,et al.  Cooperation in wireless ad hoc networks: a market-based approach , 2005, IEEE/ACM Transactions on Networking.

[7]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning in Stochastic Games , 1999, ICML 1999.

[8]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .

[9]  B. Moldovanu,et al.  Order independent equilibria , 1995 .

[10]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[11]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[12]  Craig Boutilier,et al.  Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.

[13]  David Andre,et al.  Model based Bayesian Exploration , 1999, UAI.

[14]  Matthias Klusch,et al.  Dynamic Coalition Formation among Rational Agents , 2002, IEEE Intell. Syst..

[15]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[16]  G. Stengle A nullstellensatz and a positivstellensatz in semialgebraic geometry , 1974 .

[17]  John J. Grefenstette,et al.  Credit assignment in rule discovery systems based on genetic algorithms , 1988, Machine Learning.

[18]  Ulrich Schwalbe,et al.  Dynamic Coalition Formation and the Core , 2002 .

[19]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[20]  Regret in the On-line Decision , 1997 .

[21]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[22]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[23]  Roberto Serrano,et al.  Non-cooperative implementation of the core , 1997 .

[24]  Nicholas R. Jennings,et al.  Coalition Structure Generation in Task-Based Settings , 2006, ECAI.

[25]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[26]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[27]  Ronald J. Williams,et al.  Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[28]  Matthias Klusch,et al.  Fuzzy kernel-stable coalitions between rational agents , 2003, AAMAS '03.

[29]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[30]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[31]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[32]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[33]  P. Borm,et al.  Stochastic Cooperative Games: Superadditivity, Convexity, and Certainty Equivalents , 1999 .

[34]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[35]  Sarit Kraus,et al.  The advantages of compromising in coalition formation with incomplete information , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[36]  Manuela M. Veloso,et al.  Convergence of Gradient Dynamics with a Variable Learning Rate , 2001, ICML.

[37]  Tao Wang,et al.  Bayesian sparse sampling for on-line reward optimization , 2005, ICML.

[38]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[39]  Tucker Balch,et al.  Learning Roles: Behavioral Diversity in Robot Teams , 1997 .

[40]  Anatol Rapoport,et al.  Theories of Coalition Formation , 1998 .

[41]  Arie Tamir,et al.  On the core of network synthesis games , 1991, Math. Program..

[42]  Nicholas R. Jennings,et al.  Overlapping Coalition Formation for Efficient Data Fusion in Multi-Sensor Networks , 2006, AAAI.

[43]  Judy Goldsmith,et al.  Nonapproximability Results for Partially Observable Markov Decision Processes , 2011, Universität Trier, Mathematik/Informatik, Forschungsbericht.

[44]  J K Goeree,et al.  Stochastic game theory: for playing games, not just for doing theory. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[45]  J. J. Martin Bayesian Decision Problems and Markov Chains , 1967 .

[46]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[47]  J. Nash Two-Person Cooperative Games , 1953 .

[48]  Howard Raiffa,et al.  Games And Decisions , 1958 .

[49]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[50]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[51]  Katia P. Sycara,et al.  Algorithm for combinatorial coalition formation and payoff division in an electronic marketplace , 2002, AAMAS '02.

[52]  Christos H. Papadimitriou,et al.  Worst-case equilibria , 1999 .

[53]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[54]  Stuart J. Russell,et al.  Do the right thing , 1991 .

[55]  W. Hamilton,et al.  The evolution of cooperation. , 1984, Science.

[56]  Somesh Jha,et al.  Multi-Agent Coordination through Coalition Formation , 1997, ATAL.

[57]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[58]  Steven Willmott,et al.  Modelling coalition formation over time for iterative coalition games , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[59]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[60]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[61]  Herbert Gintis,et al.  Game Theory Evolving: A Problem-Centered Introduction to Modeling Strategic Interaction - Second Edition , 2009 .

[62]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[63]  Eric Joel Hovitz Computation and action under bounded resources , 1991 .

[64]  S. Basu,et al.  Algorithms in real algebraic geometry , 2003 .

[65]  Lloyd S. Shapley,et al.  On balanced sets and cores , 1967 .

[66]  Morton D. Davis,et al.  The kernel of a cooperative game , 1965 .

[67]  Janusz S. Kowalik,et al.  Iterative methods for nonlinear optimization problems , 1972 .

[68]  S. Hart,et al.  Bargaining and Value , 1996 .

[69]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[70]  Daniel Kudenko,et al.  Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[71]  Akira Okada A Noncooperative Coalitional Bargaining Game with Random Proposers , 1996 .

[72]  I. Grossmann Review of Nonlinear Mixed-Integer and Disjunctive Programming Techniques , 2002 .

[73]  R. Stearns Convergent transfer schemes for $N$-person games , 1968 .

[74]  H P Young,et al.  On the impossibility of predicting the behavior of rational agents , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[75]  Maja J. Matarić,et al.  Leaning to behave socially , 1994 .

[76]  Katia P. Sycara,et al.  Distributed Intelligent Agents , 1996, IEEE Expert.

[77]  Andrew G. Barto,et al.  Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .

[78]  Victor R. Lesser,et al.  Coalitions Among Computationally Bounded Agents , 1997, Artif. Intell..

[79]  Yishay Mansour,et al.  Fast Planning in Stochastic Games , 2000, UAI.

[80]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[81]  J. K. Satia,et al.  Markovian Decision Processes with Uncertain Transition Probabilities , 1973, Oper. Res..

[82]  John Nachbar Prediction, optimization, and learning in repeated games , 1997 .

[83]  Roger B. Myerson,et al.  Game theory - Analysis of Conflict , 1991 .

[84]  Marco Wiering,et al.  Explorations in efficient reinforcement learning , 1999 .

[85]  Huibin Yan,et al.  Noncooperative selection of the core , 2003, Int. J. Game Theory.

[86]  Victor R. Lesser,et al.  Organization-based coalition formation , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[87]  Katia P. Sycara,et al.  A stable and efficient buyer coalition formation scheme for e-marketplaces , 2001, AGENTS '01.

[88]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[89]  Pascal Poupart,et al.  Bayesian Reputation Modeling in E-Marketplaces Sensitive to Subjectivity, Deception and Change , 2006, AAAI.

[90]  Sarit Kraus,et al.  Multiagent Negotiation under Time Constraints , 1995, Artif. Intell..

[91]  Jeremy L. Wyatt,et al.  Exploration Control in Reinforcement Learning using Optimistic Model Selection , 2001, International Conference on Machine Learning.

[92]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[93]  Debraj Ray,et al.  A noncooperative theory of coalitional bargaining , 1993 .

[94]  Sachiyo Arai,et al.  Credit assignment method for learning effective stochastic policies in uncertain domains , 2001 .

[95]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[96]  Craig Boutilier,et al.  A Bayesian Approach to Imitation in Reinforcement Learning , 2003, IJCAI.

[97]  Jeffrey S. Rosenschein,et al.  Coalition, Cryptography, and Stability: Mechanisms for Coalition Formation in Task Oriented Domains , 2018, AAAI.

[98]  Amy Greenwald,et al.  A General Class of No-Regret Learning Algorithms and Game-Theoretic Equilibria , 2003, COLT.

[99]  Katia P. Sycara,et al.  Coordination of Multiple Intelligent Software Agents , 1996, Int. J. Cooperative Inf. Syst..

[100]  Onn Shehory,et al.  Coalition structure generation with worst case guarantees , 2022 .

[101]  Michael Wooldridge,et al.  Understanding the Emergence of Conventions in Multi-Agent Systems , 1995, ICMAS.

[102]  David Carmel,et al.  Learning Models of Intelligent Agents , 1996, AAAI/IAAI, Vol. 1.

[103]  M. Degroot Optimal Statistical Decisions , 1970 .

[104]  Katia P. Sycara,et al.  Mechanisms for coalition formation and cost sharing in an electronic marketplace , 2003, ICEC '03.

[105]  J. Friedman Game theory with applications to economics , 1986 .

[106]  Eric Allender,et al.  Complexity of finite-horizon Markov decision process problems , 2000, JACM.

[107]  Anatol Rapoport,et al.  N-Person Game Theory , 1970 .

[108]  Vincent Conitzer,et al.  Complexity of determining nonemptiness of the core , 2003, EC '03.

[109]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[110]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[111]  Nicholas R. Jennings,et al.  TRAVOS: Trust and Reputation in the Context of Inaccurate Information Sources , 2006, Autonomous Agents and Multi-Agent Systems.

[112]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[113]  Bikramjit Banerjee,et al.  Selecting partners , 2000, AGENTS '00.

[114]  Sachiyo Arai,et al.  Multi-agent reinforcement learning for planning and conflict resolution in a dynamic domain , 2000, AGENTS '00.

[115]  Yoav Shoham,et al.  On the Agenda(s) of Research on Multi-Agent Learning , 2004, AAAI Technical Report.

[116]  Murali Agastya,et al.  Adaptive Play in Multiplayer Bargaining Situations , 1997 .

[117]  Craig Boutilier,et al.  Coalition formation under uncertainty: bargaining equilibria and the Bayesian core stability concept , 2007, AAMAS '07.

[118]  Xin Li,et al.  Adaptive, confidence-based multiagent negotiation strategy , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[119]  Marie-Françoise Roy,et al.  On the combinatorial and algebraic complexity of Quanti erEliminationS , 1994 .

[120]  John Nachbar,et al.  Bayesian learning in repeated games of incomplete information , 2001, Soc. Choice Welf..

[121]  Sandip Sen,et al.  Learning to Coordinate without Sharing Information , 1994, AAAI.

[122]  John C. Harsanyi,et al.  Общая теория выбора равновесия в играх / A General Theory of Equilibrium Selection in Games , 1989 .

[123]  Stuart J. Russell,et al.  Do the right thing - studies in limited rationality , 1991 .

[124]  Moshe Tennenholtz,et al.  On the Synthesis of Useful Social Laws for Artificial Agent Societies (Preliminary Report) , 1992, AAAI.

[125]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[126]  Sarit Kraus,et al.  Methods for Task Allocation via Agent Coalition Formation , 1998, Artif. Intell..

[127]  Hiroaki Kitano,et al.  RoboCup Rescue A Grand Challenge for Multiagent and Intelligent Systems , 2001 .

[128]  M. Matarić Learning to Behave Socially , 1994 .

[129]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[130]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[131]  Sarit Kraus,et al.  Coalition formation with uncertain heterogeneous information , 2003, AAMAS '03.

[132]  B. Sturmfels SOLVING SYSTEMS OF POLYNOMIAL EQUATIONS , 2002 .

[133]  John G. Kemeny,et al.  Finite Markov chains , 1960 .

[134]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[135]  Steven Reece,et al.  Rumours and reputation: evaluating multi-dimensional trust within a decentralised reputation system , 2007, AAMAS '07.

[136]  Y. Shoham,et al.  Editorial: economic principles of multi-agent systems , 1997 .

[137]  Debraj Ray,et al.  Coalition formation as a dynamic process , 2003, J. Econ. Theory.

[138]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[139]  Craig Boutilier,et al.  Implicit Imitation in Multiagent Reinforcement Learning , 1999, ICML.

[140]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[141]  Paul Bourgine,et al.  Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty , 1999, Machine Learning.

[142]  Xiaotie Deng,et al.  On the Complexity of Cooperative Solution Concepts , 1994, Math. Oper. Res..

[143]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[144]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[145]  R. Evans,et al.  Coalitional Bargaining with Competition to Make Offers , 1997 .

[146]  Katia P. Sycara,et al.  Multi-agent learning in extensive games with complete information , 2003, AAMAS '03.

[147]  C. Lee Giles,et al.  Talking Helps: Evolving Communicating Agents for the Predator-Prey Pursuit Problem , 2000, Artificial Life.

[148]  Craig Boutilier,et al.  Coalitional Bargaining with Agent Type Uncertainty , 2007, IJCAI.

[149]  Craig Boutilier,et al.  Bayesian reinforcement learning for coalition formation under uncertainty , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[150]  Scott Sanner,et al.  Practical Linear Value-approximation Techniques for First-order MDPs , 2006, UAI.

[151]  Matthias Klusch,et al.  Trusted kernel-based coalition formation , 2005, AAMAS '05.

[152]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[153]  A. Rubinstein Perfect Equilibrium in a Bargaining Model , 1982 .

[154]  Alun D. Preece,et al.  Agent-based virtual organisations for the Grid , 2005, AAMAS '05.

[155]  Alun D. Preece,et al.  Agent-based formation of virtual organisations , 2004, Knowl. Based Syst..

[156]  Manuela Veloso,et al.  An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .

[157]  M. Wooders,et al.  Multijurisdictional economies, the tiebout hypothesis, and sorting. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[158]  Nicholas R. Jennings,et al.  Computational-Mechanism Design: A Call to Arms , 2003, IEEE Intell. Syst..

[159]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[160]  Sarit Kraus,et al.  Feasible Formation of Coalitions Among Autonomous Agents in Nonsuperadditive Environments , 1999, Comput. Intell..

[161]  Vincent Conitzer,et al.  Coalitional Games in Open Anonymous Environments , 2005, IJCAI.

[162]  L. S. Shapley,et al.  17. A Value for n-Person Games , 1953 .

[163]  Craig Boutilier,et al.  Learning Conventions in Multiagent Stochastic Domains using Likelihood Estimates , 1996, UAI.

[164]  Holly A. Yanco,et al.  An adaptive communication protocol for cooperating mobile robots , 1993 .

[165]  Xiaofeng Wang,et al.  Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[166]  Dov Samet,et al.  Learning to play games in extensive form by valuation , 2001, J. Econ. Theory.

[167]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[168]  P. Reny,et al.  A Noncooperative View of Coalition Formation and the Core , 1994 .