Decentralized control of Partially Observable Markov Decision Processes using belief space macro-actions

The focus of this paper is on solving multi-robot planning problems in continuous spaces with partial observability. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are general models for multi-robot coordination problems, but representing and solving Dec-POMDPs is often intractable for large problems. To allow for a high-level representation that is natural for multi-robot problems and scalable to large discrete and continuous problems, this paper extends the Dec-POMDP model to the Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP). The Dec-POSMDP formulation allows asynchronous decision-making by the robots, which is crucial in multi-robot domains. We also present an algorithm for solving this Dec-POSMDP which is much more scalable than previous methods since it can incorporate closed-loop belief space macro-actions in planning. These macro-actions are automatically constructed to produce robust solutions. The proposed method's performance is evaluated on a complex multi-robot package delivery problem under uncertainty, showing that our approach can naturally represent multi-robot problems and provide high-quality solutions for large-scale problems.

[1]  Manuela M. Veloso,et al.  Decentralized MDPs with sparse interactions , 2011, Artif. Intell..

[2]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[3]  Makoto Yokoo,et al.  Not all agents are equal: scaling up distributed POMDPs for agent networks , 2008, AAMAS.

[4]  Shlomo Zilberstein,et al.  Planetary Rover Control as a Markov Decision Process , 2002 .

[5]  Arnaud Doniec,et al.  Scaling Up Decentralized MDPs Through Heuristic Search , 2012, UAI.

[6]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[7]  Martin Allen,et al.  Complexity of Decentralized Control: Special Cases , 2009, NIPS.

[8]  Shlomo Zilberstein,et al.  Achieving goals in decentralized POMDPs , 2009, AAMAS.

[9]  Feng Wu,et al.  Multi-Agent Online Planning with Communication , 2009, ICAPS.

[10]  Nicholas Roy,et al.  Efficient Planning under Uncertainty with Macro-actions , 2014, J. Artif. Intell. Res..

[11]  Nancy M. Amato,et al.  FIRM: Sampling-based feedback motion-planning under motion uncertainty and imperfect measurements , 2014, Int. J. Robotics Res..

[12]  Brahim Chaib-draa,et al.  Point-based incremental pruning heuristic for solving finite-horizon DEC-POMDPs , 2009, AAMAS.

[13]  Makoto Yokoo,et al.  Letting loose a SPIDER on a network of POMDPs: generating quality guaranteed policies , 2007, AAMAS '07.

[14]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[15]  Claudia V. Goldman,et al.  Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[16]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[17]  Shlomo Zilberstein,et al.  Formal models and algorithms for decentralized decision making under uncertainty , 2008, Autonomous Agents and Multi-Agent Systems.

[18]  Manuela M. Veloso,et al.  Reasoning about joint beliefs for execution-time communication decisions , 2005, AAMAS '05.

[19]  Shlomo Zilberstein,et al.  Point-based backup for decentralized POMDPs: complexity and new algorithms , 2010, AAMAS.

[20]  Shlomo Zilberstein,et al.  Policy Iteration for Decentralized Control of Markov Decision Processes , 2009, J. Artif. Intell. Res..

[21]  Frans A. Oliehoek,et al.  Incremental clustering and expansion for faster optimal planning in decentralized POMDPs , 2013 .

[22]  David Hsu,et al.  Monte Carlo Value Iteration with Macro-Actions , 2011, NIPS.

[23]  Shlomo Zilberstein,et al.  Value-based observation compression for DEC-POMDPs , 2008, AAMAS.

[24]  François Charpillet,et al.  Mixed Integer Linear Programming for Exact Finite-Horizon Planning in Decentralized Pomdps , 2007, ICAPS.

[25]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[26]  A. Bryson Applied Linear Optimal Control: Examples and Alogrithms , 2002 .

[27]  R. Bellman Dynamic programming. , 1957, Science.

[28]  S. Zilberstein,et al.  Formal Models and Algorithms for Decentralized Control of Multiple Agents Technical Report UM-CS-2005-068 , 2005 .

[29]  Shlomo Zilberstein,et al.  Incremental Policy Generation for Finite-Horizon DEC-POMDPs , 2009, ICAPS.

[30]  Anthony J. Calise,et al.  Adaptive output feedback control of nonlinear systems using neural networks , 2001, Autom..

[31]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[32]  Shimon Whiteson,et al.  Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs , 2013, J. Artif. Intell. Res..

[33]  Leonid Peshkin,et al.  Reinforcement learning for adaptive routing , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[34]  Arthur E. Bryson,et al.  Applied Optimal Control , 1969 .

[35]  Brahim Chaib-draa,et al.  Exact Dynamic Programming for Decentralized POMDPs with Lossless Policy Compression , 2008, ICAPS.

[36]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[37]  Nikos A. Vlassis,et al.  Multiagent Planning Under Uncertainty with Stochastic Communication Delays , 2008, ICAPS.

[38]  Anne Condon,et al.  On the undecidability of probabilistic planning and related stochastic optimization problems , 2003, Artif. Intell..

[39]  Marek Petrik,et al.  A Bilinear Programming Approach for Multiagent Planning , 2009, J. Artif. Intell. Res..

[40]  Xueqing Sun,et al.  An Extension of Bayesian Game Approximation to Partially Observable Stochastic Games with Competition and Cooperation , 2010, IC-AI.

[41]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[42]  Jie Lin,et al.  Coordination of groups of mobile autonomous agents using nearest neighbor rules , 2003, IEEE Trans. Autom. Control..

[43]  Victor R. Lesser,et al.  Coordinating multi-agent reinforcement learning with limited communication , 2013, AAMAS.

[44]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[45]  Jaakko Peltonen,et al.  Efficient Planning for Factored Infinite-Horizon DEC-POMDPs , 2011, IJCAI.

[46]  Leslie Pack Kaelbling,et al.  Planning with macro-actions in decentralized POMDPs , 2014, AAMAS.

[47]  S. Zilberstein,et al.  Bounded Dynamic Programming for Decentralized POMDPs , 2007 .

[48]  Milind Tambe,et al.  Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping , 2009, ICAPS.

[49]  Shimon Whiteson,et al.  Exploiting locality of interaction in factored Dec-POMDPs , 2008, AAMAS.

[50]  Olivier Buffet,et al.  Multi-Agent Systems by Incremental Gradient Reinforcement Learning , 2001, IJCAI.

[51]  Jonathan P. How,et al.  Planning for decentralized control of multiple robots under uncertainty , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[52]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[53]  Claudia V. Goldman,et al.  Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis , 2004, J. Artif. Intell. Res..

[54]  Leslie Pack Kaelbling,et al.  Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.

[55]  Frans A. Oliehoek,et al.  Tree-Based Solution Methods for Multiagent POMDPs with Delayed Communication , 2012, AAAI.

[56]  Shlomo Zilberstein,et al.  Anytime Planning for Decentralized POMDPs using Expectation Maximization , 2010, UAI.

[57]  Victor R. Lesser,et al.  Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs , 2011, AAAI.

[58]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[59]  Olivier Buffet,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Optimally Solving Dec-POMDPs as Continuous-State MDPs , 2022 .

[60]  Jeff G. Schneider,et al.  Game Theoretic Control for Robot Teams , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[61]  Makoto Yokoo,et al.  Communications for improving policy computation in distributed POMDPs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[62]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[63]  Shlomo Zilberstein,et al.  Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs , 2010, Autonomous Agents and Multi-Agent Systems.

[64]  Shobha Venkataraman,et al.  Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.

[65]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[66]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[67]  Pascal Poupart,et al.  Partially Observable Markov Decision Processes , 2010, Encyclopedia of Machine Learning.

[68]  S. Zilberstein,et al.  Event-detecting multi-agent MDPs: complexity and constant-factor approximation , 2009, IJCAI 2009.

[69]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[70]  Hari Balakrishnan,et al.  TCP ex machina: computer-generated congestion control , 2013, SIGCOMM.

[71]  Khashayar Khorasani,et al.  Multi-agent team cooperation: A game theory approach , 2009, Autom..

[72]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[73]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[74]  Edmund H. Durfee,et al.  Towards a unifying characterization for quantifying weak coupling in dec-POMDPs , 2011, AAMAS.

[75]  Marek Petrik,et al.  Average-Reward Decentralized Markov Decision Processes , 2007, IJCAI.

[76]  Leslie Pack Kaelbling,et al.  All learning is Local: Multi-agent Learning in Global Reward Games , 2003, NIPS.

[77]  Laurent Jeanpierre,et al.  Coordinated Multi-Robot Exploration Under Communication Constraints Using Decentralized Markov Decision Processes , 2012, AAAI.

[78]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[79]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[80]  G. W. Wornell,et al.  Decentralized control of a multiple access broadcast channel: performance bounds , 1996, Proceedings of 35th IEEE Conference on Decision and Control.

[81]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[82]  Victor R. Lesser,et al.  Analyzing myopic approaches for multi-agent communication , 2005, IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[83]  Jonathan P. How,et al.  Health aware stochastic planning for persistent package delivery missions using quadrotors , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[84]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[85]  R. Stengel Stochastic Optimal Control: Theory and Application , 1986 .

[86]  Marc Toussaint,et al.  Scalable Multiagent Planning Using Probabilistic Inference , 2011, IJCAI.

[87]  Feng Wu,et al.  Point-based policy generation for decentralized POMDPs , 2010, AAMAS.

[88]  Benjamin Van Roy,et al.  An approximate dynamic programming approach to decentralized control of stochastic systems , 2006 .

[89]  Blai Bonet,et al.  Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs , 2010, AAAI.

[90]  Shlomo Zilberstein,et al.  Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs , 2007, UAI.

[91]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[92]  Magnus Egerstedt,et al.  Graph Theoretic Methods in Multiagent Networks , 2010, Princeton Series in Applied Mathematics.

[93]  Shlomo Zilberstein,et al.  Constraint-based dynamic programming for decentralized POMDPs with structured interactions , 2009, AAMAS.

[94]  Richard M. Murray,et al.  Recent Research in Cooperative Control of Multivehicle Systems , 2007 .

[95]  Charles L. Isbell,et al.  Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs , 2013, NIPS.

[96]  François Charpillet,et al.  Producing efficient error-bounded solutions for transition independent decentralized mdps , 2013, AAMAS.

[97]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[98]  Guy Shani,et al.  Noname manuscript No. (will be inserted by the editor) A Survey of Point-Based POMDP Solvers , 2022 .

[99]  François Charpillet,et al.  An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs , 2005, ECML.

[100]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[101]  Francisco S. Melo,et al.  Interaction-driven Markov games for decentralized multiagent planning under uncertainty , 2008, AAMAS.

[102]  Jaakko Peltonen,et al.  Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning , 2011, NIPS.