Toward Exploring End-to-End Learning Algorithms for Autonomous Aerial Machines

We develop AirLearning, a tool suite for endto-end closed-loop UAV analysis, equipped with a customized yet randomized environment generator in order to expose the UAV with a diverse set of challenges. We take Deep Q networks (DQN) as an example deep reinforcement learning algorithm and use curriculum learning to train a point to point obstacle avoidance policy. While we determine the best policy based on the success rate, we evaluate it under strict resource constraints on an embedded platform such as RasPi 3. Using hardware in the loop methodology, we quantify the policy’s performance with quality of flight metrics such as energy consumed, endurance and the average length of the trajectory. We find that the trajectories produced on the embedded platform are very different from those predicted on the desktop, resulting in up to 26.43% longer trajectories. Quality of flight metrics with hardware in the loop characterizes those differences in simulation, thereby exposing how the choice of onboard compute contributes to shortening or widening of ‘Sim2Real’ gap.

[1]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[2]  Agathoniki Trigoni,et al.  Supporting Search and Rescue Operations with UAVs , 2010, 2010 International Conference on Emerging Security Technologies.

[3]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[4]  Adang Suwandi Ahmad,et al.  Hardware In The Loop Simulator in UAV Rapid Development Life Cycle , 2008, ArXiv.

[5]  Aleksandra Faust,et al.  Learning Navigation Behaviors End-to-End With AutoRL , 2018, IEEE Robotics and Automation Letters.

[6]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[7]  S.H. Hsiung,et al.  Unmanned Aerial Vehicle for infrastructure inspection with image processing for quantification of measurement and formation of facade map , 2017, 2017 International Conference on Applied System Innovation (ICASI).

[8]  Anne Goodchild,et al.  Delivery by drone: An evaluation of unmanned aerial vehicle technology in reducing CO 2 emissions in the delivery service industry , 2017, Transportation Research Part D: Transport and Environment.

[9]  Lydia Tapia,et al.  Automated aerial suspended cargo delivery through reinforcement learning , 2017, Artif. Intell..

[10]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[11]  Stéphane Doncieux,et al.  Crossing the reality gap in evolutionary robotics by promoting transferable controllers , 2010, GECCO '10.

[12]  Steven Seidman,et al.  A synthesis of reinforcement learning and robust control theory , 2000 .

[13]  Thomas Bräunl,et al.  Leveraging multiple simulators for crossing the reality gap , 2012, 2012 12th International Conference on Control Automation Robotics & Vision (ICARCV).

[14]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[15]  Abhinav Gupta,et al.  Learning to fly by crashing , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  A. Hammer Project title: , 1999 .