Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller

Fully autonomous navigation using nano drones has numerous applications in the real world, ranging from search and rescue to source seeking. Nano drones are well-suited for source seeking because of their agility, low price, and ubiquitous character. Unfortunately, their constrained form factor limits flight time, sensor payload, and compute capability. These challenges are a crucial limitation for the use of source-seeking nano drones in GPS-denied and highly cluttered environments. Hereby, we introduce a fully autonomous deep reinforcement learning-based light-seeking nano drone. The 33-gram nano drone performs all computation on-board the ultra-low-power microcontroller (MCU). We present the method for efficiently training, converting, and utilizing deep reinforcement learning policies. Our training methodology and novel quantization scheme allow fitting the trained policy in 3 kB of memory. The quantization scheme uses representative input data and input scaling to arrive at a full 8-bit model. Finally, we evaluate the approach in simulation and flight tests using a Bitcraze CrazyFlie, achieving 80% success rate on average in a highly cluttered and randomized test environment. Even more, the drone finds the light source in 29% fewer steps compared to a baseline simulation (obstacle avoidance without source information). To our knowledge, this is the first deep reinforcement learning method that enables source seeking within a highly constrained nano drone demonstrating robust flight behavior. Our general methodology is suitable for any (source seeking) highly constrained platform using deep reinforcement learning.

[1]  Raghuraman Krishnamoorthi,et al.  Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.

[2]  Sergey Levine,et al.  Low-Level Control of a Quadrotor With Deep Model-Based Reinforcement Learning , 2019, IEEE Robotics and Automation Letters.

[3]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[4]  Sonia Martínez,et al.  Stochastic Source Seeking for Mobile Robots in Obstacle Environments Via the SPSA Method , 2019, IEEE Transactions on Automatic Control.

[5]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[6]  Rui Zou,et al.  Particle Swarm Optimization-Based Source Seeking , 2015, IEEE Transactions on Automation Science and Engineering.

[7]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[8]  Jian Huang,et al.  Odor source localization algorithms on mobile robots: A review and future outlook , 2019, Robotics Auton. Syst..

[9]  Azer Bestavros,et al.  Reinforcement Learning for UAV Attitude Control , 2018, ACM Trans. Cyber Phys. Syst..

[10]  Sergey Levine,et al.  Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[11]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[12]  Steve Dini,et al.  Combining Q-Learning with Artificial Neural Networks in an Adaptive Light Seeking Robot , 2012 .

[13]  Luca Benini,et al.  Ultra Low Power Deep-Learning-powered Autonomous Nano Drones , 2018, ArXiv.

[14]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[15]  Anthony G. Francis,et al.  Evolving Rewards to Automate Reinforcement Learning , 2019, ArXiv.

[16]  Aiguo Song,et al.  Small Teleoperated Robot for Nuclear Radiation and Chemical Leak Detection , 2012 .

[17]  Aleksandra Faust,et al.  Air Learning: An AI Research Platform for Algorithm-Hardware Benchmarking of Autonomous Aerial Robots , 2019, ArXiv.

[18]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[19]  Yiannis Aloimonos,et al.  GapFlyt: Active Vision Based Minimalist Structure-Less Gap Detection For Quadrotor Flight , 2018, IEEE Robotics and Automation Letters.

[20]  Dario Izzo,et al.  Evolutionary robotics approach to odor source localization , 2013, Neurocomputing.

[21]  Ronald Lumia,et al.  Distributed Robotic Radiation Mapping , 2008, ISER.

[22]  Shan Luo,et al.  Fly Safe: Aerial Swarm Robotics using Force Field Particle Swarm Optimisation , 2019, ArXiv.

[23]  Achim J. Lilienthal,et al.  Smelling Nano Aerial Vehicle for Gas Source Localization and Mapping , 2019, Sensors.

[24]  Lydia Tapia,et al.  PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Jordi Palacín,et al.  Measuring Coverage Performances of a Floor Cleaning Mobile Robot Using a Vision System , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.