论文信息 - Decentralized control of Partially Observable Markov Decision Processes using belief space macro-actions

Decentralized control of Partially Observable Markov Decision Processes using belief space macro-actions

The focus of this paper is on solving multi-robot planning problems in continuous spaces with partial observability. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are general models for multi-robot coordination problems, but representing and solving Dec-POMDPs is often intractable for large problems. To allow for a high-level representation that is natural for multi-robot problems and scalable to large discrete and continuous problems, this paper extends the Dec-POMDP model to the Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP). The Dec-POSMDP formulation allows asynchronous decision-making by the robots, which is crucial in multi-robot domains. We also present an algorithm for solving this Dec-POSMDP which is much more scalable than previous methods since it can incorporate closed-loop belief space macro-actions in planning. These macro-actions are automatically constructed to produce robust solutions. The proposed method's performance is evaluated on a complex multi-robot package delivery problem under uncertainty, showing that our approach can naturally represent multi-robot problems and provide high-quality solutions for large-scale problems.

[1] Manuela M. Veloso,et al. Decentralized MDPs with sparse interactions , 2011, Artif. Intell..

[2] François Charpillet,et al. MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[3] Makoto Yokoo,et al. Not all agents are equal: scaling up distributed POMDPs for agent networks , 2008, AAMAS.

[4] Shlomo Zilberstein,et al. Planetary Rover Control as a Markov Decision Process , 2002 .

[5] Arnaud Doniec,et al. Scaling Up Decentralized MDPs Through Heuristic Search , 2012, UAI.

[6] Shlomo Zilberstein,et al. Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[7] Martin Allen,et al. Complexity of Decentralized Control: Special Cases , 2009, NIPS.

[8] Shlomo Zilberstein,et al. Achieving goals in decentralized POMDPs , 2009, AAMAS.

[9] Feng Wu,et al. Multi-Agent Online Planning with Communication , 2009, ICAPS.

[10] Nicholas Roy,et al. Efficient Planning under Uncertainty with Macro-actions , 2014, J. Artif. Intell. Res..

[11] Nancy M. Amato,et al. FIRM: Sampling-based feedback motion-planning under motion uncertainty and imperfect measurements , 2014, Int. J. Robotics Res..

[12] Brahim Chaib-draa,et al. Point-based incremental pruning heuristic for solving finite-horizon DEC-POMDPs , 2009, AAMAS.

[13] Makoto Yokoo,et al. Letting loose a SPIDER on a network of POMDPs: generating quality guaranteed policies , 2007, AAMAS '07.

[14] Shlomo Zilberstein,et al. Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[15] Claudia V. Goldman,et al. Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[16] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[17] Shlomo Zilberstein,et al. Formal models and algorithms for decentralized decision making under uncertainty , 2008, Autonomous Agents and Multi-Agent Systems.

[18] Manuela M. Veloso,et al. Reasoning about joint beliefs for execution-time communication decisions , 2005, AAMAS '05.

[19] Shlomo Zilberstein,et al. Point-based backup for decentralized POMDPs: complexity and new algorithms , 2010, AAMAS.

[20] Shlomo Zilberstein,et al. Policy Iteration for Decentralized Control of Markov Decision Processes , 2009, J. Artif. Intell. Res..

[21] Frans A. Oliehoek,et al. Incremental clustering and expansion for faster optimal planning in decentralized POMDPs , 2013 .

[22] David Hsu,et al. Monte Carlo Value Iteration with Macro-Actions , 2011, NIPS.

[23] Shlomo Zilberstein,et al. Value-based observation compression for DEC-POMDPs , 2008, AAMAS.

[24] François Charpillet,et al. Mixed Integer Linear Programming for Exact Finite-Horizon Planning in Decentralized Pomdps , 2007, ICAPS.

[25] Makoto Yokoo,et al. Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[26] A. Bryson. Applied Linear Optimal Control: Examples and Alogrithms , 2002 .

[27] R. Bellman. Dynamic programming. , 1957, Science.

[28] S. Zilberstein,et al. Formal Models and Algorithms for Decentralized Control of Multiple Agents Technical Report UM-CS-2005-068 , 2005 .

[29] Shlomo Zilberstein,et al. Incremental Policy Generation for Finite-Horizon DEC-POMDPs , 2009, ICAPS.

[30] Anthony J. Calise,et al. Adaptive output feedback control of nonlinear systems using neural networks , 2001, Autom..

[31] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.

[32] Shimon Whiteson,et al. Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs , 2013, J. Artif. Intell. Res..

[33] Leonid Peshkin,et al. Reinforcement learning for adaptive routing , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[34] Arthur E. Bryson,et al. Applied Optimal Control , 1969 .

[35] Brahim Chaib-draa,et al. Exact Dynamic Programming for Decentralized POMDPs with Lossless Policy Compression , 2008, ICAPS.

[36] Shlomo Zilberstein,et al. Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[37] Nikos A. Vlassis,et al. Multiagent Planning Under Uncertainty with Stochastic Communication Delays , 2008, ICAPS.

[38] Anne Condon,et al. On the undecidability of probabilistic planning and related stochastic optimization problems , 2003, Artif. Intell..

[39] Marek Petrik,et al. A Bilinear Programming Approach for Multiagent Planning , 2009, J. Artif. Intell. Res..

[40] Xueqing Sun,et al. An Extension of Bayesian Game Approximation to Partially Observable Stochastic Games with Competition and Cooperation , 2010, IC-AI.

[41] Nils J. Nilsson,et al. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[42] Jie Lin,et al. Coordination of groups of mobile autonomous agents using nearest neighbor rules , 2003, IEEE Trans. Autom. Control..

[43] Victor R. Lesser,et al. Coordinating multi-agent reinforcement learning with limited communication , 2013, AAMAS.

[44] Nikos A. Vlassis,et al. Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[45] Jaakko Peltonen,et al. Efficient Planning for Factored Infinite-Horizon DEC-POMDPs , 2011, IJCAI.

[46] Leslie Pack Kaelbling,et al. Planning with macro-actions in decentralized POMDPs , 2014, AAMAS.

[47] S. Zilberstein,et al. Bounded Dynamic Programming for Decentralized POMDPs , 2007 .

[48] Milind Tambe,et al. Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping , 2009, ICAPS.

[49] Shimon Whiteson,et al. Exploiting locality of interaction in factored Dec-POMDPs , 2008, AAMAS.

[50] Olivier Buffet,et al. Multi-Agent Systems by Incremental Gradient Reinforcement Learning , 2001, IJCAI.

[51] Jonathan P. How,et al. Planning for decentralized control of multiple robots under uncertainty , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[52] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.

[53] Claudia V. Goldman,et al. Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis , 2004, J. Artif. Intell. Res..

[54] Leslie Pack Kaelbling,et al. Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.

[55] Frans A. Oliehoek,et al. Tree-Based Solution Methods for Multiagent POMDPs with Delayed Communication , 2012, AAAI.

[56] Shlomo Zilberstein,et al. Anytime Planning for Decentralized POMDPs using Expectation Maximization , 2010, UAI.

[57] Victor R. Lesser,et al. Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs , 2011, AAAI.

[58] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[59] Olivier Buffet,et al. Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Optimally Solving Dec-POMDPs as Continuous-State MDPs , 2022 .

[60] Jeff G. Schneider,et al. Game Theoretic Control for Robot Teams , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[61] Makoto Yokoo,et al. Communications for improving policy computation in distributed POMDPs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[62] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[63] Shlomo Zilberstein,et al. Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs , 2010, Autonomous Agents and Multi-Agent Systems.

[64] Shobha Venkataraman,et al. Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.

[65] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[66] Makoto Yokoo,et al. Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[67] Pascal Poupart,et al. Partially Observable Markov Decision Processes , 2010, Encyclopedia of Machine Learning.

[68] S. Zilberstein,et al. Event-detecting multi-agent MDPs: complexity and constant-factor approximation , 2009, IJCAI 2009.

[69] Craig Boutilier,et al. Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[70] Hari Balakrishnan,et al. TCP ex machina: computer-generated congestion control , 2013, SIGCOMM.

[71] Khashayar Khorasani,et al. Multi-agent team cooperation: A game theory approach , 2009, Autom..

[72] Anne Condon,et al. On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[73] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..