Integrating Decision Sharing with Prediction in Decentralized Planning for Multi-Agent Coordination under Uncertainty

The performance of decentralized multi-agent systems tends to benefit from information sharing and its effective utilization. However, too much or unnecessary sharing may hinder the performance due to the delay, instability and additional overhead of communications. Aiming to a satisfiable coordination performance, one would prefer the cost of communications as less as possible. In this paper, we propose an approach to improve the sharing utilization by integrating information sharing with prediction in decentralized planning. We present a novel planning algorithm by combining decision sharing and prediction based on decentralized Monte Carlo Tree Search called Dec-MCTS-SP. Each agent grows a search tree guided by the rewards calculated by the joint actions, which can not only be sampled from the shared probability distributions over action sequences, but also be predicted by a sufficiently-accurate and computationallycheap heuristics-based method. Besides, several policies including sparse and discounted UCT and DIY-bonus are leveraged for performance improvement. We have implemented Dec-MCTS-SP in the case study on multi-agent information gathering under threat and uncertainty, which is formulated as Decentralized Partially Observable Markov Decision Process (Dec-POMDP). The factored belief vectors are integrated into Dec-MCTS-SP to handle the uncertainty. Comparing with the random, auction-based algorithm and Dec-MCTS, the evaluation shows that Dec-MCTS-SP can reduce communication cost significantly while still achieving a surprisingly higher coordination performance.

[1]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[2]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[3]  Timothy Patten,et al.  Dec-MCTS: Decentralized planning for multi-robot active perception , 2019, Int. J. Robotics Res..

[4]  Frans A. Oliehoek,et al.  Decentralised Online Planning for Multi-Robot Warehouse Commissioning , 2017, AAMAS.

[5]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[6]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[7]  Xuejun Yang,et al.  Energy-efficient joint communication-motion planning for relay-assisted wireless robot surveillance , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[8]  Ernesto Nunes,et al.  Multi-Robot Auctions for Allocation of Tasks with Temporal Constraints , 2015, AAAI.

[9]  Robert Fitch,et al.  Planning-Aware Communication for Decentralised Multi-Robot Coordination , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Aníbal Ollero,et al.  Decentralized multi-robot cooperation with auctioned POMDPs , 2012, 2012 IEEE International Conference on Robotics and Automation.

[11]  Yasamin Mostofi,et al.  Communication-Aware Motion Planning in Mobile Networks , 2011, IEEE Transactions on Automatic Control.

[12]  S. Shankar Sastry,et al.  Pursuit-evasion strategies for teams of multiple agents with incomplete information , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[13]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[14]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[15]  Karl Henrik Johansson,et al.  Exploiting multipath fading with a mobile robot , 2013, Int. J. Robotics Res..

[16]  Frans A. Oliehoek,et al.  Interactive Learning and Decision Making: Foundations, Insights & Challenges , 2018, IJCAI.

[17]  Mark H. M. Winands,et al.  Monte Carlo Tree Search for the Hide-and-Seek Game Scotland Yard , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[18]  Robert Fitch,et al.  Path Planning With Spatiotemporal Optimal Stopping for Stochastic Mission Monitoring , 2017, IEEE Transactions on Robotics.

[19]  Sarvapali D. Ramchurn,et al.  Decentralized Patrolling Under Constraints in Dynamic Environments , 2016, IEEE Transactions on Cybernetics.

[20]  Stephen J. Guy,et al.  Stochastic Tree Search with Useful Cycles for patrolling problems , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).