Learning to Seek: Deep Reinforcement Learning for Phototaxis of a Nano Drone in an Obstacle Field

Nano drones are uniquely equipped for fully autonomous applications due to their agility, low cost, and small size. However, their constrained form factor limits flight time, sensor payload, and compute capability. While visual servoing of nano drones can achieve complex tasks, state of the art solutions have significant impact on endurance and cost. The primary goal of our work is to demonstrate phototaxis in an obstacle field, by adding only a lightweight and low-cost light sensor to a nano drone. We deploy a deep reinforcement learning model, capable of direct paths even with noisy sensor readings. By carefully designing the network input, we feed features relevant to the agent in finding the source, while reducing computational cost and enabling inference up to 100 Hz onboard the nano drone. We verify our approach with simulation and in-field testing on a Bitcraze CrazyFlie, achieving 94% success rate in cluttered and randomized test environments. The policy demonstrates efficient light seeking by reaching the goal in simulation in 65% fewer steps and with 60% shorter paths, compared to a baseline random walker algorithm.

[1]  Dario Izzo,et al.  Evolutionary robotics approach to odor source localization , 2013, Neurocomputing.

[2]  Anthony G. Francis,et al.  Evolving Rewards to Automate Reinforcement Learning , 2019, ArXiv.

[3]  Shan Luo,et al.  Fly Safe: Aerial Swarm Robotics using Force Field Particle Swarm Optimisation , 2019, ArXiv.

[4]  Roland Siegwart,et al.  Control of a Quadrotor With Reinforcement Learning , 2017, IEEE Robotics and Automation Letters.

[5]  Guido C. H. E. de Croon,et al.  A Comparative Study of Bug Algorithms for Robot Navigation , 2018, Robotics Auton. Syst..

[6]  Rui Zou,et al.  Particle Swarm Optimization-Based Source Seeking , 2015, IEEE Transactions on Automation Science and Engineering.

[7]  Luca Benini,et al.  A 64-mW DNN-Based Visual Navigation Engine for Autonomous Nano-Drones , 2018, IEEE Internet of Things Journal.

[8]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[9]  Sergey Levine,et al.  Low-Level Control of a Quadrotor With Deep Model-Based Reinforcement Learning , 2019, IEEE Robotics and Automation Letters.

[10]  Jitendra Malik,et al.  On Evaluation of Embodied Navigation Agents , 2018, ArXiv.

[11]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[12]  Jian Huang,et al.  Odor source localization algorithms on mobile robots: A review and future outlook , 2019, Robotics Auton. Syst..

[13]  Wojciech Giernacki,et al.  Crazyflie 2.0 quadrotor as a platform for research and education in robotics and control engineering , 2017, 2017 22nd International Conference on Methods and Models in Automation and Robotics (MMAR).

[14]  Azer Bestavros,et al.  Reinforcement Learning for UAV Attitude Control , 2018, ACM Trans. Cyber Phys. Syst..

[15]  Sergey Levine,et al.  Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[16]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[17]  Yiannis Aloimonos,et al.  GapFlyt: Active Vision Based Minimalist Structure-Less Gap Detection For Quadrotor Flight , 2018, IEEE Robotics and Automation Letters.

[18]  Gaurav S. Sukhatme,et al.  Sim-to-(Multi)-Real: Transfer of Low-Level Robust Control Policies to Multiple Quadrotors , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Aleksandra Faust,et al.  Air Learning: An AI Research Platform for Algorithm-Hardware Benchmarking of Autonomous Aerial Robots , 2019, ArXiv.

[20]  Steven M. LaValle,et al.  I-Bug: An intensity-based bug algorithm , 2009, 2009 IEEE International Conference on Robotics and Automation.

[21]  Lydia Tapia,et al.  PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22]  James Evans,et al.  Optimization algorithms for networks and graphs , 1992 .

[23]  Jordi Palacín,et al.  Measuring Coverage Performances of a Floor Cleaning Mobile Robot Using a Vision System , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.