Graph Policy Gradients for Large Scale Robot Control

In this paper, we consider the problem of learning policies to control a large number of homogeneous robots. To this end, we propose a new algorithm we call Graph Policy Gradients (GPG) that exploits the underlying graph symmetry among the robots. The curse of dimensionality one encounters when working with a large number of robots is mitigated by employing a graph convolutional neural (GCN) network to parametrize policies for the robots. The GCN reduces the dimensionality of the problem by learning filters that aggregate information among robots locally, similar to how a convolutional neural network is able to learn local features in an image. Through experiments on formation flying, we show that our proposed method is able to scale better than existing reinforcement methods that employ fully connected networks. More importantly, we show that by using our locally learned filters we are able to zero-shot transfer policies trained on just three robots to over hundred robots.

[1]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[2]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[3]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[4]  George J. Pappas,et al.  Flocking while preserving network connectivity , 2007, 2007 46th IEEE Conference on Decision and Control.

[5]  Kurt Konolige,et al.  A practical, decision-theoretic approach to multi-robot mapping and exploration , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[6]  Vijay Kumar,et al.  Memory Augmented Control Networks , 2017, ICLR.

[7]  Antonio G. Marques,et al.  Convolutional Neural Network Architectures for Signals Supported on Graphs , 2018, IEEE Transactions on Signal Processing.

[8]  Vijay Kumar,et al.  Trajectory design and control for aggressive formation flight with quadrotors , 2012, Auton. Robots.

[9]  Zongqing Lu,et al.  Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation , 2018, ArXiv.

[10]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[11]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[12]  Fernando Gama,et al.  Stability Properties of Graph Neural Networks , 2019, IEEE Transactions on Signal Processing.

[13]  Vijay Kumar,et al.  Anytime Planning for Decentralized Multirobot Active Information Gathering , 2018, IEEE Robotics and Automation Letters.

[14]  Vijay Kumar,et al.  Scalable Centralized Deep Multi-Agent Reinforcement Learning via Policy Gradients , 2018, ArXiv.

[15]  James Davidson,et al.  TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow , 2017, ArXiv.

[16]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[17]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[18]  Vijay Kumar,et al.  Opportunities and challenges with autonomous micro aerial vehicles , 2012, Int. J. Robotics Res..

[19]  John Enright,et al.  Optimization and Coordinated Autonomy in Mobile Fulfillment Systems , 2011, Automated Action Planning for Autonomous Mobile Robots.

[20]  Ruslan Salakhutdinov,et al.  Concurrent Meta Reinforcement Learning , 2019, ArXiv.

[21]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[22]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[24]  Howie Choset,et al.  Coverage for robotics – A survey of recent results , 2001, Annals of Mathematics and Artificial Intelligence.

[25]  Vijay Kumar,et al.  Decentralized formation control with variable shapes for aerial robots , 2012, 2012 IEEE International Conference on Robotics and Automation.

[26]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[28]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Vijay Kumar,et al.  Concurrent Control of Mobility and Communication in Multirobot Systems , 2017, IEEE Transactions on Robotics.

[30]  Ross A. Knepper,et al.  IkeaBot: An autonomous multi-robot coordinated furniture assembly system , 2013, 2013 IEEE International Conference on Robotics and Automation.