Integrating human experience in deep reinforcement learning for multi-UAV collision detection and avoidance

Purpose This paper aims to realize a fully distributed multi-UAV collision detection and avoidance based on deep reinforcement learning (DRL). To deal with the problem of low sample efficiency in DRL and speed up the training. To improve the applicability and reliability of the DRL-based approach in multi-UAV control problems. Design/methodology/approach In this paper, a fully distributed collision detection and avoidance approach for multi-UAV based on DRL is proposed. A method that integrates human experience into policy training via a human experience-based adviser is proposed. The authors propose a hybrid control method which combines the learning-based policy with traditional model-based control. Extensive experiments including simulations, real flights and comparative experiments are conducted to evaluate the performance of the approach. Findings A fully distributed multi-UAV collision detection and avoidance method based on DRL is realized. The reward curve shows that the training process when integrating human experience is significantly accelerated and the mean episode reward is higher than the pure DRL method. The experimental results show that the DRL method with human experience integration has a significant improvement than the pure DRL method for multi-UAV collision detection and avoidance. Moreover, the safer flight brought by the hybrid control method has also been validated. Originality/value The fully distributed architecture is suitable for large-scale unmanned aerial vehicle (UAV) swarms and real applications. The DRL method with human experience integration has significantly accelerated the training compared to the pure DRL method. The proposed hybrid control strategy makes up for the shortcomings of two-dimensional light detection and ranging and other puzzles in applications.

[1]  Brett Browning,et al.  Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Jie Li,et al.  Mission Oriented Miniature Fixed-wing UAV Swarms: A Multi-layered and Distributed Architecture , 2019, ArXiv.

[3]  Lihua Xie,et al.  Decentralized Multi-UAV Flight Autonomy for Moving Convoys Search and Track , 2017, IEEE Transactions on Control Systems Technology.

[4]  Roland Siegwart,et al.  Fast nonlinear Model Predictive Control for unified trajectory optimization and tracking , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Javier Ruiz-del-Solar,et al.  An Interactive Framework for Learning Continuous Actions Policies Based on Corrective Feedback , 2018, Journal of Intelligent & Robotic Systems.

[6]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[7]  Matthew E. Taylor,et al.  Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning , 2017, ArXiv.

[8]  Song Jing-yan Path planning for nonholonomic mobile robots using artificial potential field method , 2010 .

[9]  Dinesh Manocha,et al.  Reciprocal n-Body Collision Avoidance , 2011, ISRR.

[10]  Hyochoong Bang,et al.  Proportional navigation-based collision avoidance for UAVs , 2009 .

[11]  Peter Stone,et al.  Transfer learning for reinforcement learning on a physical robot , 2010, AAMAS 2010.

[12]  Jia Pan,et al.  Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios , 2020, Int. J. Robotics Res..

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[15]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[16]  Sun Xiu-xia,et al.  A Route Planning's Method for Unmanned Aerial Vehicles Based on Improved A-Star Algorithm , 2008 .

[17]  Javier Ruiz-del-Solar,et al.  Interactive Learning with Corrective Feedback for Policies based on Deep Neural Networks , 2018, ISER.

[18]  J.K. Hedrick,et al.  An overview of emerging results in cooperative UAV control , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[19]  Mike Hall,et al.  Cooperative use of unmanned sea surface and micro aerial vehicles at Hurricane Wilma , 2008, J. Field Robotics.

[20]  Jia Pan,et al.  Deep-Learned Collision Avoidance Policy for Distributed Multiagent Navigation , 2016, IEEE Robotics and Automation Letters.

[21]  Yang Yu,et al.  Towards Sample Efficient Reinforcement Learning , 2018, IJCAI.

[22]  Youmin Zhang,et al.  Sense and avoid technologies with applications to unmanned aircraft systems: Review and prospects , 2015 .

[23]  Zhuoning Dong,et al.  A hybrid approach of virtual force and A∗ search algorithm for UAV path re-planning , 2011, 2011 6th IEEE Conference on Industrial Electronics and Applications.

[24]  Aníbal Ollero,et al.  A cooperative perception system for multiple UAVs: Application to automatic detection of forest fires , 2006, J. Field Robotics.

[25]  Jürgen Schmidhuber,et al.  A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[26]  Peter Stone,et al.  Leveraging Human Guidance for Deep Reinforcement Learning Tasks , 2019, IJCAI.

[27]  Dinesh Manocha,et al.  Reciprocal Velocity Obstacles for real-time multi-agent navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[28]  Antonios Tsourdos,et al.  Co-operative path planning of multiple UAVs using Dubins paths with clothoid arcs , 2010 .

[29]  Jiang Wu,et al.  Max-Min Adaptive Ant Colony Optimization Approach to Multi-UAVs Coordinated Trajectory Replanning in Dynamic and Uncertain Environments , 2009 .

[30]  Jian Wu,et al.  A Novel Real-Time Penetration Path Planning Algorithm for Stealth UAV in 3D Complex Dynamic Environment , 2020, IEEE Access.

[31]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[32]  Jonathan P. How,et al.  Aircraft trajectory planning with collision avoidance using mixed integer linear programming , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[33]  Richard Vaughan,et al.  Massively multi-robot simulation in stage , 2008, Swarm Intelligence.

[34]  Li Wei Behavior Based Control of A Mobile Robot in Unknown Environments Using Fuzzy Logic , 1996 .

[35]  Simon X. Yang,et al.  Genetic algorithm based path planning for a mobile robot , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[36]  Robin R. Murphy,et al.  Cooperative use of unmanned sea surface and micro aerial vehicles at Hurricane Wilma , 2008 .

[37]  Marc Pollefeys,et al.  PX4: A node-based multithreaded open source robotics framework for deeply embedded platforms , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Kun Xiao,et al.  XTDrone: A Customizable Multi-rotor UAVs Simulation Platform , 2020, 2020 4th International Conference on Robotics and Automation Sciences (ICRAS).