Modeling and Planning with Macro-Actions in Decentralized POMDPs

Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for decentralized multi-agent decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent's actions are primitive operations lasting exactly one time step. We address the case where each agent has macro-actions: temporally extended actions that may require different amounts of time to execute. We model macro-actions as options in a Dec-POMDP, focusing on actions that depend only on information directly available to the agent during execution. Therefore, we model systems where coordination decisions only occur at the level of deciding which macro-actions to execute. The core technical difficulty in this setting is that the options chosen by each agent no longer terminate at the same time. We extend three leading Dec-POMDP algorithms for policy generation to the macro-action case, and demonstrate their effectiveness in both standard benchmarks and a multi-robot coordination problem. The results show that our new algorithms retain agent coordination while allowing high-quality solutions to be generated for significantly longer horizons and larger state-spaces than previous Dec-POMDP methods. Furthermore, in the multi-robot domain, we show that, in contrast to most existing methods that are specialized to a particular problem class, our approach can synthesize control policies that exploit opportunities for coordination while balancing uncertainty, sensor information, and information about other agents.

[1]  Shimon Whiteson,et al.  Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs , 2013, J. Artif. Intell. Res..

[2]  Yingke Chen,et al.  Individual Planning in Agent Populations: Exploiting Anonymity and Frame-Action Hypergraphs , 2015, ICAPS.

[3]  Olivier Buffet,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Optimally Solving Dec-POMDPs as Continuous-State MDPs , 2022 .

[4]  Jeff G. Schneider,et al.  Game Theoretic Control for Robot Teams , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[5]  Jonathan P. How,et al.  Decentralized control of partially observable Markov decision processes , 2015, 52nd IEEE Conference on Decision and Control.

[6]  K.J. Kyriakopoulos,et al.  Automatic synthesis of multi-agent motion tasks based on LTL specifications , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[7]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[8]  Blai Bonet,et al.  Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs , 2010, AAAI.

[9]  Shlomo Zilberstein,et al.  Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs , 2007, UAI.

[10]  Jonathan P. How,et al.  Semantic-level decentralized multi-robot decision-making using probabilistic macro-observations , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Shlomo Zilberstein,et al.  Point-based backup for decentralized POMDPs: complexity and new algorithms , 2010, AAMAS.

[12]  Victor R. Lesser,et al.  Decentralized Markov decision processes with event-driven interactions , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[13]  Leslie Pack Kaelbling,et al.  Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.

[14]  Tucker R. Balch,et al.  Behavior-based formation control for multirobot teams , 1998, IEEE Trans. Robotics Autom..

[15]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[16]  Patrick Jaillet,et al.  Decentralized Stochastic Planning with Anonymity in Interactions , 2014, AAAI.

[17]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[18]  Hoong Chuin Lau,et al.  Policy Gradient With Value Function Approximation For Collective Multiagent Planning , 2018, NIPS.

[19]  David Silver,et al.  Compositional Planning Using Optimal Option Models , 2012, ICML.

[20]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[21]  Sridhar Mahadevan,et al.  Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.

[22]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[23]  Jonathan P. How,et al.  Learning to Teach in Cooperative Multiagent Reinforcement Learning , 2018, AAAI.

[24]  Jonathan P. How,et al.  Learning for multi-robot cooperation in partially observable stochastic environments with macro-actions , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Howie Choset,et al.  Limited communication, multi-robot team based coverage , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[26]  Antonio Bicchi,et al.  Symbolic planning and control of robot motion [Grand Challenges of Robotics] , 2007, IEEE Robotics & Automation Magazine.

[27]  Jonathan P. How,et al.  Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments , 2016, AAAI.

[28]  Jonathan P. How,et al.  Stick-Breaking Policy Learning in Dec-POMDPs , 2015, IJCAI.

[29]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[30]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[31]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[32]  Jonathan P. How,et al.  Policy search for multi-robot coordination under uncertainty , 2015, Int. J. Robotics Res..

[33]  Jonathan P. How,et al.  Decentralized control of multi-robot partially observable Markov decision processes using belief space macro-actions , 2017, Int. J. Robotics Res..

[34]  Feng Wu,et al.  Monte-Carlo Expectation Maximization for Decentralized POMDPs , 2013, IJCAI.

[35]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[36]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[37]  Shlomo Zilberstein,et al.  Dual Formulations for Optimizing Dec-POMDP Controllers , 2016, ICAPS.

[38]  Nidhi Kalra,et al.  Hoplites: A Market-Based Framework for Planned Tight Coordination in Multirobot Teams , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[39]  Shlomo Zilberstein,et al.  Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs , 2010, Autonomous Agents and Multi-Agent Systems.

[40]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[41]  François Charpillet,et al.  Producing efficient error-bounded solutions for transition independent decentralized mdps , 2013, AAMAS.

[42]  Francisco S. Melo,et al.  Interaction-driven Markov games for decentralized multiagent planning under uncertainty , 2008, AAMAS.

[43]  Jaakko Peltonen,et al.  Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning , 2011, NIPS.

[44]  Tucker R. Balch,et al.  Value-based action selection for exploration and dynamic target observation with robot teams , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[45]  AmatoChristopher,et al.  Optimally solving Dec-POMDPs as continuous-state MDPs , 2016 .

[46]  Shlomo Zilberstein,et al.  Incremental Policy Generation for Finite-Horizon DEC-POMDPs , 2009, ICAPS.

[47]  Jean-Louis Deneubourg,et al.  From local actions to global tasks: stigmergy and collective robotics , 2000 .

[48]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[49]  Leslie Pack Kaelbling,et al.  From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning , 2018, J. Artif. Intell. Res..

[50]  Manuela M. Veloso,et al.  Decentralized MDPs with sparse interactions , 2011, Artif. Intell..

[51]  Feng Wu,et al.  Point-based policy generation for decentralized POMDPs , 2010, AAMAS.

[52]  Marc Toussaint,et al.  Probabilistic Inference Techniques for Scalable Multiagent Decision Making , 2015, J. Artif. Intell. Res..

[53]  Eric Bonabeau,et al.  Cooperative transport by ants and robots , 2000, Robotics Auton. Syst..

[54]  Shimon Whiteson,et al.  Approximate solutions for factored Dec-POMDPs with many agents , 2013, AAMAS.

[55]  Pedro U. Lima,et al.  GSMDPs for Multi-Robot Sequential Decision-Making , 2013, AAAI.

[56]  François Charpillet,et al.  Mixed Integer Linear Programming for Exact Finite-Horizon Planning in Decentralized Pomdps , 2007, ICAPS.

[57]  Rahul Savani,et al.  Lenient Multi-Agent Deep Reinforcement Learning , 2017, AAMAS.

[58]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[59]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[60]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[61]  Prasanna Velagapudi,et al.  Distributed model shaping for scaling to decentralized POMDPs with hundreds of agents , 2011, AAMAS.

[62]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[63]  James J. Kuffner,et al.  Navigation among movable obstacles: real-time reasoning in complex environments , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[64]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[65]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[66]  Nikos A. Vlassis,et al.  The Cross-Entropy Method for Policy Search in Decentralized POMDPs , 2008, Informatica.

[67]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[68]  Nicholas Roy,et al.  Efficient Planning under Uncertainty with Macro-actions , 2014, J. Artif. Intell. Res..

[69]  Frans A. Oliehoek,et al.  Decentralized POMDPs , 2012, Reinforcement Learning.

[70]  Shlomo Zilberstein,et al.  Policy Iteration for Decentralized Control of Markov Decision Processes , 2009, J. Artif. Intell. Res..

[71]  Lynne E. Parker,et al.  ALLIANCE: an architecture for fault tolerant multirobot cooperation , 1998, IEEE Trans. Robotics Autom..

[72]  David Hsu,et al.  Monte Carlo Value Iteration with Macro-Actions , 2011, NIPS.

[73]  Victor R. Lesser,et al.  A survey of multi-agent organizational paradigms , 2004, The Knowledge Engineering Review.

[74]  Feng Wu,et al.  Rollout Sampling Policy Iteration for Decentralized POMDPs , 2010, UAI.

[75]  Jonathan P. How,et al.  Near-Optimal Adversarial Policy Switching for Decentralized Asynchronous Multi-Agent Systems , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[76]  Alborz Geramifard,et al.  Decentralized control of Partially Observable Markov Decision Processes using belief space macro-actions , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[77]  Anthony Stentz,et al.  A comparative study between centralized, market-based, and behavioral multirobot coordination approaches , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[78]  Shimon Whiteson,et al.  Lossless clustering of histories in decentralized POMDPs , 2009, AAMAS.

[79]  Hoong Chuin Lau,et al.  Collective Multiagent Sequential Decision Making Under Uncertainty , 2017, AAAI.

[80]  Brahim Chaib-draa,et al.  Exact Dynamic Programming for Decentralized POMDPs with Lossless Policy Compression , 2008, ICAPS.

[81]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[82]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[83]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[84]  Jonathan P. How,et al.  Graph-based Cross Entropy method for solving multi-robot decentralized POMDPs , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).