论文信息 - Graph-based Cross Entropy method for solving multi-robot decentralized POMDPs

Graph-based Cross Entropy method for solving multi-robot decentralized POMDPs

This paper introduces a probabilistic algorithm for multi-robot decision-making under uncertainty, which can be posed as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP). Dec-POMDPs are inherently synchronous decision-making frameworks which require significant computational resources to be solved, making them infeasible for many real-world robotics applications. The Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP) was recently introduced as an extension of the Dec-POMDP that uses high-level macro-actions to allow large-scale, asynchronous decision-making. However, existing Dec-POSMDP solution methods have limited scalability or perform poorly as the problem size grows. This paper proposes a cross-entropy based Dec-POSMDP algorithm motivated by the combinatorial optimization literature. The algorithm is applied to a constrained package delivery domain, where it significantly outperforms existing Dec-POSMDP solution methods.

[1] Jonathan P. How,et al. Planning for decentralized control of multiple robots under uncertainty , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[2] Jonathan P. How,et al. Stick-Breaking Policy Learning in Dec-POMDPs , 2015, IJCAI.

[3] Nancy M. Amato,et al. FIRM: Sampling-based feedback motion-planning under motion uncertainty and imperfect measurements , 2014, Int. J. Robotics Res..

[4] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[5] Jonathan P. How,et al. Health aware stochastic planning for persistent package delivery missions using quadrotors , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6] Jonathan P. How,et al. Policy search for multi-robot coordination under uncertainty , 2015, Int. J. Robotics Res..

[7] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[8] Brahim Chaib-draa,et al. Exact Dynamic Programming for Decentralized POMDPs with Lossless Policy Compression , 2008, ICAPS.

[9] Shlomo Zilberstein,et al. Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[10] Frans A. Oliehoek,et al. Decentralized POMDPs , 2012, Reinforcement Learning.

[11] Shlomo Zilberstein,et al. Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[12] Jonathan P. How,et al. Decentralized control of partially observable Markov decision processes , 2015, 52nd IEEE Conference on Decision and Control.

[13] François Charpillet,et al. An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs , 2005, ECML.

[14] Alborz Geramifard,et al. Decentralized control of Partially Observable Markov Decision Processes using belief space macro-actions , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[15] Shimon Whiteson,et al. Lossless clustering of histories in decentralized POMDPs , 2009, AAMAS.

[16] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.

[17] Nikos A. Vlassis,et al. The Cross-Entropy Method for Policy Search in Decentralized POMDPs , 2008, Informatica.

[18] François Charpillet,et al. MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[19] Shlomo Zilberstein,et al. Policy Iteration for Decentralized Control of Markov Decision Processes , 2009, J. Artif. Intell. Res..

[20] David Hsu,et al. Monte Carlo Value Iteration with Macro-Actions , 2011, NIPS.

[21] Shlomo Zilberstein,et al. Value-based observation compression for DEC-POMDPs , 2008, AAMAS.

[22] Shlomo Zilberstein,et al. Incremental Policy Generation for Finite-Horizon DEC-POMDPs , 2009, ICAPS.

[23] Anne Condon,et al. On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[24] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[25] Nikos A. Vlassis,et al. Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[26] Leslie Pack Kaelbling,et al. Planning with macro-actions in decentralized POMDPs , 2014, AAMAS.

[27] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.