Planning, Learning and Coordination in Multiagent Decision Processes

There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from single-agent planning and learning can be applied in multiagent settings. We survey a number of different techniques from decision-theoretic planning and reinforcement learning and describe a number of interesting issues that arise with regard to coordinating the policies of individual agents. To this end, we describe multiagent Markov decision processes as a general model in which to frame this discussion. These are special n-person cooperative games in which agents share the same utility function. We discuss coordination mechanisms based on imposed conventions (or social laws) as well as learning methods for coordination. Our focus is on the decomposition of sequential decision processes so that coordination can be learned (or imposed) locally, at the level of individual states. We also discuss the use of structured problem representations and their role in the generalization of learned conventions and in approximation.

[1]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[2]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[3]  David Lewis Convention: A Philosophical Study , 1986 .

[4]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[5]  M. Puterman,et al.  Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[6]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[7]  W. Hamilton,et al.  The evolution of cooperation. , 1984, Science.

[8]  David M. Kreps,et al.  Sequential Equilibria1 , 1982 .

[9]  Paul J. Schweitzer,et al.  Iterative Aggregation-Disaggregation Procedures for Discounted Semi-Markov Reward Processes , 1985, Oper. Res..

[10]  John C. Harsanyi,et al.  Общая теория выбора равновесия в играх / A General Theory of Equilibrium Selection in Games , 1989 .

[11]  D. Bertsekas,et al.  Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[12]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[13]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[14]  A. Moore Variable Resolution Dynamic Programming , 1991, ML.

[15]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[16]  Roger B. Myerson,et al.  Game theory - Analysis of Conflict , 1991 .

[17]  Michael P. Wellman,et al.  Planning and Control , 1991 .

[18]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[19]  P. Dayan The Convergence of TD(λ) for General λ , 1992, Machine Learning.

[20]  Moshe Tennenholtz,et al.  On the Synthesis of Useful Social Laws for Artificial Agent Societies (Preliminary Report) , 1992, AAAI.

[21]  Moshe Tennenholtz,et al.  Emergent Conventions in Multi-Agent Systems: Initial Experimental Results and Observations (Preliminary Report) , 1992, KR.

[22]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[23]  H. Young,et al.  The Evolution of Conventions , 1993 .

[24]  Leslie Pack Kaelbling,et al.  Planning With Deadlines in Stochastic Domains , 1993, AAAI.

[25]  R. Rob,et al.  Learning, Mutation, and Long Run Equilibria in Games , 1993 .

[26]  Holly A. Yanco,et al.  An adaptive communication protocol for cooperating mobile robots , 1993 .

[27]  D. Fudenberg,et al.  Steady state learning and Nash equilibrium , 1993 .

[28]  Gerhard Weiss,et al.  Learning to Coordinate Actions in Multi-Agent-Systems , 1993, IJCAI.

[29]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[30]  Craig Boutilier,et al.  Using Abstractions for Decision-Theoretic Planning with Time Constraints , 1994, AAAI.

[31]  Stuart J. Russell,et al.  Control Strategies for a Stochastic Planner , 1994, AAAI.

[32]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[33]  Eithan Ephrati,et al.  Divide and Conquer in Multi-Agent Planning , 1994, AAAI.

[34]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[35]  Sridhar Mahadevan,et al.  To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.

[36]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[37]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[38]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[39]  Sandip Sen,et al.  Learning to Coordinate without Sharing Information , 1994, AAAI.

[40]  Craig Boutilier,et al.  Integrating Planning and Execution in Stochastic Domains , 1994, UAI.

[41]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[42]  Thomas G. Dietterich,et al.  Explanation-Based Learning and Reinforcement Learning: A Unified View , 1995, Machine-mediated learning.

[43]  Craig Boutilier,et al.  Process-Oriented Planning and Average-Reward Optimality , 1995, IJCAI.

[44]  Thomas Dean,et al.  Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[45]  Reid G. Simmons,et al.  Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[46]  Sandip Sen,et al.  Multiagent Coordination with Learning Classifier Systems , 1995, Adaption and Learning in Multi-Agent Systems.

[47]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[48]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[49]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[50]  Craig Boutilier,et al.  Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[51]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[52]  Craig Boutilier,et al.  Approximate Value Trees in Structured Dynamic Programming , 1996, ICML.

[53]  T. Dean,et al.  Planning under uncertainty: structural assumptions and computational leverage , 1996 .