Cooperation without Coordination: Hierarchical Predictive Planning for Decentralized Multiagent Navigation

Decentralized multiagent planning raises many challenges, such as adaption to changing environments inexplicable by the agent's own behavior, coordination from noisy sensor inputs like lidar, cooperation without knowing other agents' intents. To address these challenges, we present hierarchical predictive planning (HPP) for decentralized multiagent navigation tasks. HPP learns prediction models for itself and other teammates, and uses the prediction models to propose and evaluate navigation goals that complete the cooperative task without explicit coordination. To learn the prediction models, HPP observes other agents' behavior and learns to maps own sensors to predicted locations of other agents. HPP then uses the cross-entropy method to iteratively propose, evaluate, and improve navigation goals, under assumption that all agents in the team share a common objective. HPP removes the need for a centralized operator (i.e. robots determine their own actions without coordinating their beliefs or plans) and can be trained and easily transferred to real world environments. The results show that HPP generalizes to new environments including real-world robot team. It is also 33x more sample efficient and performs better in complex environments compared to a baseline. The video and website for this paper can be found at this https URL and this https URL.

[1]  Dinesh Manocha,et al.  Getting Robots Unfrozen and Unlost in Dense Pedestrian Crowds , 2018, IEEE Robotics and Automation Letters.

[2]  Lydia Tapia,et al.  Preference-balancing motion planning under stochastic disturbances , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Aleksandra Faust,et al.  Long-Range Indoor Navigation With PRM-RL , 2020, IEEE Transactions on Robotics.

[4]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[5]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[6]  Mohammad Bagher Menhaj,et al.  Model-Based Reinforcement Learning in Multiagent Systems with Sequential Action Selection , 2011, IEICE Trans. Inf. Syst..

[7]  Lydia Tapia,et al.  RL-RRT: Kinodynamic Motion Planning via Learning Reachability Estimators From RL Policies , 2019, IEEE Robotics and Automation Letters.

[8]  Balaraman Ravindran,et al.  Option Discovery in Hierarchical Reinforcement Learning using Spatio-Temporal Clustering , 2016, 1605.05359.

[9]  Jonathan P. How,et al.  R-MADDPG for Partially Observable Environments and Limited Communication , 2019, ArXiv.

[10]  Fei Sha,et al.  Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.

[11]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[12]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[13]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[14]  Scott Niekum,et al.  Efficient Hierarchical Robot Motion Planning Under Uncertainty and Hybrid Dynamics , 2018, CoRL.

[15]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[16]  Sarit Kraus,et al.  Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[17]  Honglak Lee,et al.  Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.

[18]  Aleksandra Faust,et al.  Learning Navigation Behaviors End-to-End With AutoRL , 2018, IEEE Robotics and Automation Letters.

[19]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[20]  Daniel King,et al.  Fetch & Freight : Standard Platforms for Service Robot Applications , 2016 .

[21]  Han-Lim Choi,et al.  Consensus-Based Auction Approaches for Decentralized Task Assignment , 2008 .

[22]  Maria L. Gini,et al.  Very fast motion planning for dexterous robots , 1995, Proceedings. IEEE International Symposium on Assembly and Task Planning.

[23]  Tze-Yun Leong,et al.  An Efficient Approach to Model-Based Hierarchical Reinforcement Learning , 2017, AAAI.

[24]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[25]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[26]  Jun Ota,et al.  Multirobot motion coordination in space and time , 1998, Robotics Auton. Syst..

[27]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..