Learning for multi-robot cooperation in partially observable stochastic environments with macro-actions

This paper presents a data-driven approach for multi-robot coordination in partially-observable domains based on Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a general framework for cooperative sequential decision making under uncertainty and MAs allow temporally extended and asynchronous action execution. To date, most methods assume the underlying Dec-POMDP model is known a priori or a full simulator is available during planning time. Previous methods which aim to address these issues suffer from local optimality and sensitivity to initial conditions. Additionally, few hardware demonstrations involving a large team of heterogeneous robots and with long planning horizons exist. This work addresses these gaps by proposing an iterative sampling based Expectation-Maximization algorithm (iSEM) to learn polices using only trajectory data containing observations, MAs, and rewards. Our experiments show the algorithm is able to achieve better solution quality than the state-of-the-art learning-based methods. We implement two variants of multi-robot Search and Rescue (SAR) domains (with and without obstacles) on hardware to demonstrate the learned policies can effectively control a team of distributed robots to cooperate in a partially observable stochastic environment.

[1]  Max Gath,et al.  Optimizing Transport Logistics Processes with Multiagent Planning and Control , 2016, Advanced Studies Mobile Research Center Bremen.

[2]  Jonathan P. How,et al.  Stick-Breaking Policy Learning in Dec-POMDPs , 2015, IJCAI.

[3]  Jonathan P. How,et al.  Measurable Augmented Reality for Prototyping Cyberphysical Systems: A Robotics Platform to Aid the Hardware Prototyping and Performance Testing of Algorithms , 2016, IEEE Control Systems.

[4]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[5]  Siobhán Grayson,et al.  Search & Rescue using Multi-Robot Systems , 2014 .

[6]  Han-Lim Choi,et al.  Consensus-Based Decentralized Auctions for Robust Task Allocation , 2009, IEEE Transactions on Robotics.

[7]  R. Sreerama Kumar,et al.  Intelligent decision making in multi-agent robot soccer system through compounded artificial neural networks , 2007, Robotics Auton. Syst..

[8]  Daniel J. Simon,et al.  Evolutionary optimization algorithms : biologically-Inspired and population-based approaches to computer intelligence , 2013 .

[9]  Shlomo Zilberstein,et al.  Anytime Planning for Decentralized POMDPs using Expectation Maximization , 2010, UAI.

[10]  Jonathan P. How,et al.  Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments , 2016, AAAI.

[11]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[12]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[13]  Leslie Pack Kaelbling,et al.  Planning with macro-actions in decentralized POMDPs , 2014, AAMAS.

[14]  M. Innocenti,et al.  Fast unmanned vehicles task allocation with moving targets , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[15]  Marc Toussaint,et al.  Probabilistic Inference Techniques for Scalable Multiagent Decision Making , 2015, J. Artif. Intell. Res..

[16]  Marios M. Polycarpou,et al.  Cooperative real-time search and task allocation in UAV teams , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[17]  Shlomo Zilberstein,et al.  Planetary Rover Control as a Markov Decision Process , 2002 .

[18]  Jonathan P. How,et al.  Duckietown: An open, inexpensive and flexible platform for autonomy education and research , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Dinesh Manocha,et al.  Reciprocal Velocity Obstacles for real-time multi-agent navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[20]  Peter Stone,et al.  Multiagent Traffic Management: Opportunities for Multiagent Learning , 2005, LAMAS.

[21]  Jonathan P. How,et al.  Planning for decentralized control of multiple robots under uncertainty , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[23]  Vijay Kumar,et al.  Trajectory generation and control for precise aggressive maneuvers with quadrotors , 2012, Int. J. Robotics Res..

[24]  Lawrence Carin,et al.  Solving DEC-POMDPs by Expectation Maximization of Value Function , 2016, AAAI Spring Symposia.

[25]  Feng Wu,et al.  Monte-Carlo Expectation Maximization for Decentralized POMDPs , 2013, IJCAI.

[26]  Jonathan P. How,et al.  Graph-based Cross Entropy method for solving multi-robot decentralized POMDPs , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Blai Bonet,et al.  Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs , 2010, AAAI.

[28]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[29]  Alborz Geramifard,et al.  Decentralized control of Partially Observable Markov Decision Processes using belief space macro-actions , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Gerald Seet,et al.  Multiple-Robot Systems for USAR: Key Design Attributes and Deployment Issues , 2011 .

[31]  Jonathan P. How,et al.  Policy search for multi-robot coordination under uncertainty , 2015, Int. J. Robotics Res..