论文信息 - Sample Efficient Learning of Path Following and Obstacle Avoidance Behavior for Quadrotors

Sample Efficient Learning of Path Following and Obstacle Avoidance Behavior for Quadrotors

In this letter, we propose an algorithm for the training of neural network control policies for quadrotors. The learned control policy computes control commands directly from sensor inputs and is, hence, computationally efficient. An imitation learning algorithm produces a policy that reproduces the behavior of a supervisor. The supervisor provides demonstrations of path following and collision avoidance maneuvers. Due to the generalization ability of neural networks, the resulting policy performs local collision avoidance, while following a global reference path. The algorithm uses a time-free model-predictive path-following controller as a supervisor. The controller generates demonstrations by following few example paths. This enables an easy-to-implement learning algorithm that is robust to errors of the model used in the model-predictive controller. The policy is trained on the real quadrotor, which requires collision-free exploration around the example path. An adapted version of the supervisor is used to enable exploration. Thus, the policy can be trained from a relatively small number of examples on the real quadrotor, making the training sample efficient.

Javier Alonso-Mora | Otmar Hilliges | Stefan Stevšić | Tobias Nägeli

[1] Dan Gazebo Sebagai,et al. Robot Operating System (ROS) , 2020 .

[2] Roland Siegwart,et al. Continuous-time trajectory optimization for online UAV replanning , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3] Sergey Levine,et al. Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[4] Lydia Tapia,et al. Automated aerial suspended cargo delivery through reinforcement learning , 2017, Artif. Intell..

[5] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[6] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[7] Martial Hebert,et al. Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[8] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[9] Chris Manzie,et al. Model predictive contouring control , 2010, 49th IEEE Conference on Decision and Control (CDC).

[10] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11] Wolfram Burgard,et al. A Fully Autonomous Indoor Quadrotor , 2012, IEEE Transactions on Robotics.

[12] Anis Koubaa. Robot Operating System (ROS): The Complete Reference (Volume 1) , 2016 .

[13] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[14] Raffaello D'Andrea,et al. A model predictive controller for quadrocopter state interception , 2013, 2013 European Control Conference (ECC).

[15] Vijay Kumar,et al. Incremental micro-UAV motion replanning for exploring unknown environments , 2013, 2013 IEEE International Conference on Robotics and Automation.

[16] R. Sengupta,et al. Obstacle avoidance with sensor uncertainty for small unmanned aircraft , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[17] Sergey Levine,et al. PLATO: Policy learning using adaptive trajectory optimization , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18] Dusan M. Stipanovic,et al. Trajectory tracking with collision avoidance for nonholonomic vehicles with acceleration constraints and limited sensing , 2014, Int. J. Robotics Res..

[19] Alexander Domahidi,et al. Real-time planning for automated multi-view drone cinematography , 2017, ACM Trans. Graph..

[20] Zoran Popovic,et al. Interactive Control of Diverse Complex Characters with Neural Networks , 2015, NIPS.

[21] Andrew Howard,et al. Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).