Game Theoretic Control for Robot Teams

In the real world, noisy sensors and limited communication make it difficult for robot teams to coordinate in tightly coupled tasks. Team members cannot simply apply single-robot solution techniques for partially observable problems in parallel because they do not take into account the recursive effect that reasoning about the beliefs of others has on policy generation. Instead, we must turn to a game theoretic approach to model the problem correctly. Partially observable stochastic games (POSGs) provide a solution model for decentralized robot teams, however, this model quickly becomes intractable. In previous work we presented an algorithm for lookahead search in POSGs. Here we present an extension which reduces computation during lookahead by clustering similar observation histories together. We show that by clustering histories which have similar profiles of predicted reward, we can greatly reduce the computation time required to solve a POSG while maintaining a good approximation to the optimal policy. We demonstrate the power of the clustering algorithm in a real-time robot controller as well as for a simple benchmark problem.

[1]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[2]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[3]  J. Harsanyi Games with Incomplete Information Played by 'Bayesian' Players, Part III. The Basic Probability Distribution of the Game , 1968 .

[4]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[5]  R. Selten Reexamination of the perfectness concept for equilibrium points in extensive games , 1975, Classics in Game Theory.

[6]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[7]  P. Bernhard,et al.  Rabbit and hunter game: Two discrete stochastic formulations , 1987 .

[8]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[9]  G. Olsder,et al.  About When to Use the Searchlight , 1988 .

[10]  Ronald C. Arkin,et al.  Cooperation without communication: Multiagent schema-based robot navigation , 1992, J. Field Robotics.

[11]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[12]  Nicholas R. Jennings,et al.  Controlling Cooperative Problem Solving in Industrial Multi-Agent Systems Using Joint Intentions , 1995, Artif. Intell..

[13]  Maja J. Mataric,et al.  Designing and Understanding Adaptive Group Behavior , 1995, Adapt. Behav..

[14]  Sandip Sen,et al.  Adaption and Learning in Multi-Agent Systems , 1995, Lecture Notes in Computer Science.

[15]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[16]  Mark S. Fox,et al.  COOL: A Language for Describing Coordination in Multi Agent Systems , 1995, ICMAS.

[17]  Edmund H. Durfee,et al.  A Rigorous, Operational Formalization of Recursive Modeling , 1995, ICMAS.

[18]  D. Koller,et al.  Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[19]  B. Stengel,et al.  Efficient Computation of Behavior Strategies , 1996 .

[20]  R. McKelvey,et al.  Computation of equilibria in finite games , 1996 .

[21]  Ronald Fagin,et al.  Common Knowledge Revisited , 1996, Ann. Pure Appl. Log..

[22]  G. W. Wornell,et al.  Decentralized control of a multiple access broadcast channel: performance bounds , 1996, Proceedings of 35th IEEE Conference on Decision and Control.

[23]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[24]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[25]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[26]  Sarit Kraus,et al.  Collaborative Plans for Complex Group Action , 1996, Artif. Intell..

[27]  Milind Tambe,et al.  Towards Flexible Teamwork , 1997, J. Artif. Intell. Res..

[28]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[29]  B. Stengel,et al.  Team-Maxmin Equilibria☆ , 1997 .

[30]  Edmund H. Durfee,et al.  Agents Learning about Agents: A Framework and Analysis , 1997 .

[31]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[32]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[33]  Manuela M. Veloso,et al.  Communication in Domains with Unreliable, Single-Channel, Low-Bandwidth Communication , 1998, CRW.

[34]  Michael P. Wellman,et al.  Market-aware agents for a multiagent world , 1998, Robotics Auton. Syst..

[35]  Lynne E. Parker,et al.  ALLIANCE: an architecture for fault tolerant multirobot cooperation , 1998, IEEE Trans. Robotics Autom..

[36]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[37]  Bernhard Nebel,et al.  The CS Freiburg Robotic Soccer Team: Reliable Self-Localization, Multirobot Sensor Integration, and Basic Soccer Skills , 1998, RoboCup.

[38]  D. Samet Iterated Expectations and Common Priors , 1998 .

[39]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[40]  W. Burgard,et al.  Markov Localization for Mobile Robots in Dynamic Environments , 1999, J. Artif. Intell. Res..

[41]  J.P. Hespanha,et al.  Multiple-agent probabilistic pursuit-evasion games , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).

[42]  Edmund H. Durfee Practically Coordinating , 1999, AI Mag..

[43]  Manuela M. Veloso,et al.  Task Decomposition, Dynamic Role Assignment, and Low-Bandwidth Communication for Real-Time Strategic Teamwork , 1999, Artif. Intell..

[44]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[45]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[46]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[47]  S. Sastry,et al.  Probabilistic pursuit-evasion games: a one-step Nash approach , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[48]  Maja J. Mataric,et al.  Principled Communication for Dynamic Multi-robot Task Allocation , 2000, ISER.

[49]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[50]  Anthony Stentz,et al.  A Free Market Architecture for Distributed Control of a Multirobot System , 2000 .

[51]  Tucker R. Balch,et al.  CMU Hammerheads Team Description , 2000, RoboCup.

[52]  Lin Padgham,et al.  RMIT United , 2000, RoboCup.

[53]  Michael L. Littman,et al.  Abstraction Methods for Game Theoretic Poker , 2000, Computers and Games.

[54]  J. Sákovics Games of Incomplete Information Without Common Knowledge Priors , 2001 .

[55]  Michael L. Littman,et al.  Graphical Models for Game Theory , 2001, UAI.

[56]  Joseph Y. Halpern,et al.  On the NP-completeness of finding an optimal strategy in games with common payoffs , 2001, Int. J. Game Theory.

[57]  René Vidal,et al.  A hierarchical approach to probabilistic pursuit-evasion games with unmanned ground and aerial vehicles , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[58]  Lex Weaver,et al.  A Multi-Agent Policy-Gradient Approach to Network Routing , 2001, ICML.

[59]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[60]  George J. Pappas,et al.  Greedy control for hybrid pursuit games , 2001, 2001 European Control Conference (ECC).

[61]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[62]  Victor R. Lesser,et al.  Communication decisions in multi-agent cooperation: model and experiments , 2001, AGENTS '01.

[63]  Xiaofeng Wang,et al.  Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[64]  Tucker R. Balch,et al.  Protocols for collaboration, coordination and dynamic role assignment in a robot team , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[65]  Victor R. Lesser,et al.  Multi-agent policies: from centralized ones to decentralized ones , 2002, AAMAS '02.

[66]  Maja J. Mataric,et al.  Sold!: auction methods for multirobot coordination , 2002, IEEE Trans. Robotics Autom..

[67]  Milind Tambe,et al.  Computational Models for Multiagent Coordination Analysis: Extending Distributed POMDP Models , 2002, FAABS.

[68]  Brett Browning,et al.  Principled Monitoring of Distributed Agents for Detection of Coordination Failure , 2002, DARS.

[69]  Geoffrey J. Gordon,et al.  Distributed Planning in Hierarchical Factored MDPs , 2002, UAI.

[70]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[71]  Daphne Koller,et al.  Multi-agent algorithms for solving graphical games , 2002, AAAI/IAAI.

[72]  Ian Horswill,et al.  An efficient coordination architecture for autonomous robot teams , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[73]  François Charpillet,et al.  A heuristic approach for solving decentralized-POMDP: assessment on the pursuit problem , 2002, SAC '02.

[74]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[75]  Claudia V. Goldman,et al.  Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[76]  Anthony Stentz,et al.  TraderBots : a market-based approach for resource, role, and task allocation in multirobot coordination , 2003 .

[77]  Tucker Balch,et al.  Collaborative execution of exploration and tracking using move value estimation for robot teams (mvert) , 2003 .

[78]  Joelle Pineau,et al.  Policy-contingent abstraction for robust robot control , 2002, UAI.

[79]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[80]  Sebastian Thrun,et al.  Decentralized Sensor Fusion with Distributed Particle Filters , 2002, UAI.

[81]  Maja J. Mataric,et al.  Multi-robot task allocation: analyzing the complexity and optimality of key architectures , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[82]  David V. Pynadath,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[83]  Nicholas Kushmerick,et al.  Automated index management for distributed web search , 2003, CIKM '03.

[84]  Piotr J. Gmytrasiewicz,et al.  Formalizing multi-agent POMDP's in the context of network routing , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[85]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[86]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[87]  Hugh F. Durrant-Whyte,et al.  Decentralised SLAM with Low-Bandwidth Communication for Teams of Vehicles , 2003, FSR.

[88]  Claudia V. Goldman,et al.  Transition-independent decentralized markov decision processes , 2003, AAMAS '03.

[89]  Joelle Pineau,et al.  Towards robotic assistants in nursing homes: Challenges and results , 2003, Robotics Auton. Syst..

[90]  Makoto Yokoo,et al.  Communications for improving policy computation in distributed POMDPs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[91]  C. Fershtman,et al.  Finite State Dynamic Games with Asymmetric Information: A Computational Framework , 2004 .

[92]  P. J. Gmytrasiewicz,et al.  A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[93]  Anthony Stentz,et al.  Robust multirobot coordination in dynamic environments , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[94]  Michael P. Wellman,et al.  Computing approximate bayes-nash equilibria in tree-games of incomplete information , 2004, EC '04.

[95]  Jeff G. Schneider,et al.  Approximate solutions for partially observable stochastic games with common payoffs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[96]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[97]  Nikos A. Vlassis,et al.  A point-based POMDP algorithm for robot planning , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[98]  Sanguk Noh,et al.  Flexible multi-agent decision making under time pressure , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[99]  Manuela Veloso,et al.  Decentralized Communication Strategies for Coordinated Multi-Agent Policies , 2005 .

[100]  Nidhi Kalra,et al.  Hoplites: A Market-Based Framework for Planned Tight Coordination in Multirobot Teams , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[101]  Nikos A. Vlassis,et al.  Non-communicative multi-robot coordination in dynamic environments , 2005, Robotics Auton. Syst..

[102]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[103]  Adam Jacoff,et al.  RoboCup 2005: Robot Soccer World Cup IX (Lecture Notes in Computer Science) , 2006 .