论文信息 - Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller

Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller

Fully autonomous navigation using nano drones has numerous applications in the real world, ranging from search and rescue to source seeking. Nano drones are well-suited for source seeking because of their agility, low price, and ubiquitous character. Unfortunately, their constrained form factor limits flight time, sensor payload, and compute capability. These challenges are a crucial limitation for the use of source-seeking nano drones in GPS-denied and highly cluttered environments. Hereby, we introduce a fully autonomous deep reinforcement learning-based light-seeking nano drone. The 33-gram nano drone performs all computation on-board the ultra-low-power microcontroller (MCU). We present the method for efficiently training, converting, and utilizing deep reinforcement learning policies. Our training methodology and novel quantization scheme allow fitting the trained policy in 3 kB of memory. The quantization scheme uses representative input data and input scaling to arrive at a full 8-bit model. Finally, we evaluate the approach in simulation and flight tests using a Bitcraze CrazyFlie, achieving 80% success rate on average in a highly cluttered and randomized test environment. Even more, the drone finds the light source in 29% fewer steps compared to a baseline simulation (obstacle avoidance without source information). To our knowledge, this is the first deep reinforcement learning method that enables source seeking within a highly constrained nano drone demonstrating robust flight behavior. Our general methodology is suitable for any (source seeking) highly constrained platform using deep reinforcement learning.

[1] Raghuraman Krishnamoorthi,et al. Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.

[2] Sergey Levine,et al. Low-Level Control of a Quadrotor With Deep Model-Based Reinforcement Learning , 2019, IEEE Robotics and Automation Letters.

[3] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[4] Sonia Martínez,et al. Stochastic Source Seeking for Mobile Robots in Obstacle Environments Via the SPSA Method , 2019, IEEE Transactions on Automatic Control.

[5] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.

[6] Rui Zou,et al. Particle Swarm Optimization-Based Source Seeking , 2015, IEEE Transactions on Automation Science and Engineering.

[7] Vincent Vanhoucke,et al. Improving the speed of neural networks on CPUs , 2011 .

[8] Jian Huang,et al. Odor source localization algorithms on mobile robots: A review and future outlook , 2019, Robotics Auton. Syst..

[9] Azer Bestavros,et al. Reinforcement Learning for UAV Attitude Control , 2018, ACM Trans. Cyber Phys. Syst..

[10] Sergey Levine,et al. Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[11] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[12] Steve Dini,et al. Combining Q-Learning with Artificial Neural Networks in an Adaptive Light Seeking Robot , 2012 .

[13] Luca Benini,et al. Ultra Low Power Deep-Learning-powered Autonomous Nano Drones , 2018, ArXiv.

[14] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[15] Anthony G. Francis,et al. Evolving Rewards to Automate Reinforcement Learning , 2019, ArXiv.

[16] Aiguo Song,et al. Small Teleoperated Robot for Nuclear Radiation and Chemical Leak Detection , 2012 .

[17] Aleksandra Faust,et al. Air Learning: An AI Research Platform for Algorithm-Hardware Benchmarking of Autonomous Aerial Robots , 2019, ArXiv.

[18] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[19] Yiannis Aloimonos,et al. GapFlyt: Active Vision Based Minimalist Structure-Less Gap Detection For Quadrotor Flight , 2018, IEEE Robotics and Automation Letters.

[20] Dario Izzo,et al. Evolutionary robotics approach to odor source localization , 2013, Neurocomputing.

[21] Ronald Lumia,et al. Distributed Robotic Radiation Mapping , 2008, ISER.

[22] Shan Luo,et al. Fly Safe: Aerial Swarm Robotics using Force Field Particle Swarm Optimisation , 2019, ArXiv.

[23] Achim J. Lilienthal,et al. Smelling Nano Aerial Vehicle for Gas Source Localization and Mapping , 2019, Sensors.

[24] Lydia Tapia,et al. PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[25] Jordi Palacín,et al. Measuring Coverage Performances of a Floor Cleaning Mobile Robot Using a Vision System , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.