Towards Intelligent Aircraft Through Deep Reinforcement Learning

Deep reinforcement learning has achieved recent successes in solving games and learning robotics tasks from scratch, and has shown early promise for the guidance, navigation, and control of MAVs. Though MAV control is well-established, many complex tasks still require human oversight, and techniques for reducing the level of human involvement are still nascent. In this paper, we present ongoing work in applying continuous-action deep reinforcement learning to autonomous aircraft in simulation, in order to learn such complex tasks autonomously. We provide a brief overview of our simulation environment and tasks of interest, and present preliminary results using model-free methods to learn simple flight tasks. We conclude with remarks on potential directions of research that we believe will have an impact on the future of unmanned systems.

[1]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[2]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[3]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[5]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[6]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[7]  Roland Siegwart,et al.  RotorS—A Modular Gazebo MAV Simulator Framework , 2016 .

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Roland Siegwart,et al.  Control of a Quadrotor With Reinforcement Learning , 2017, IEEE Robotics and Automation Letters.

[10]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[11]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[12]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[13]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[14]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[15]  Anil A. Bharath,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[16]  Patrick Doherty,et al.  Deep Learning Quadcopter Control via Risk-Aware Active Learning , 2017, AAAI.

[17]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[18]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.