iADA*-RL: Anytime Graph-Based Path Planning with Deep Reinforcement Learning for an Autonomous UAV

Path planning algorithms are of paramount importance in guidance and collision systems to provide trustworthiness and safety for operations of autonomous unmanned aerial vehicles (UAV). Previous works showed different approaches mostly focusing on shortest path discovery without a sufficient consideration on local planning and collision avoidance. In this paper, we propose a hybrid path planning algorithm that uses an anytime graph-based path planning algorithm for global planning and deep reinforcement learning for local planning which applied for a real-time mission planning system of an autonomous UAV. In particular, we aim to achieve a highly autonomous UAV mission planning system that is adaptive to real-world environments consisting of both static and moving obstacles for collision avoidance capabilities. To achieve adaptive behavior for real-world problems, a simulator is required that can imitate real environments for learning. For this reason, the simulator must be sufficiently flexible to allow the UAV to learn about the environment and to adapt to real-world conditions. In our scheme, the UAV first learns about the environment via a simulator, and only then is it applied to the real-world. The proposed system is divided into two main parts: optimal flight path generation and collision avoidance. A hybrid path planning approach is developed by combining a graph-based path planning algorithm with a learning-based algorithm for local planning to allow the UAV to avoid a collision in real time. The global path planning problem is solved in the first stage using a novel anytime incremental search algorithm called improved Anytime Dynamic A* (iADA*). A reinforcement learning method is used to carry out local planning between waypoints, to avoid any obstacles within the environment. The developed hybrid path planning system was investigated and validated in an AirSim environment. A number of different simulations and experiments were performed using AirSim platform in order to demonstrate the effectiveness of the proposed system for an autonomous UAV. This study helps expand the existing research area in designing efficient and safe path planning algorithms for UAVs.

[1]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[2]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[3]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[4]  Aleksandra Faust,et al.  Air Learning: An AI Research Platform for Algorithm-Hardware Benchmarking of Autonomous Aerial Robots , 2019, ArXiv.

[5]  Surya P. N. Singh,et al.  V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[7]  Maxim Tyan,et al.  iADA*: Improved Anytime Path Planning and Replanning Algorithm for Autonomous Vehicle , 2020, Journal of Intelligent & Robotic Systems.

[8]  Wenzhi Cui,et al.  MAVBench: Micro Aerial Vehicle Benchmarking , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Aleksandra Faust,et al.  Long-Range Indoor Navigation With PRM-RL , 2020, IEEE Transactions on Robotics.

[10]  Kazuyuki Morioka,et al.  Autonomous robot navigation system with learning based on deep Q-network and topological maps , 2017, 2017 IEEE/SICE International Symposium on System Integration (SII).

[11]  Victor J. Gonzalez-Villela,et al.  ArduPilot Working Environment , 2020 .

[12]  Shaojie Shen,et al.  VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator , 2017, IEEE Transactions on Robotics.

[13]  Lydia Tapia,et al.  RL-RRT: Kinodynamic Motion Planning via Learning Reachability Estimators From RL Policies , 2019, IEEE Robotics and Automation Letters.

[14]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[15]  Marc Pollefeys,et al.  PIXHAWK: A system for autonomous flight using onboard computer vision , 2011, 2011 IEEE International Conference on Robotics and Automation.

[16]  Jesús García,et al.  Simulation in real conditions of navigation and obstacle avoidance with PX4/Gazebo platform , 2020 .

[17]  Jinjun Shan,et al.  A novel cable-suspended quadrotor transportation system: From theory to experiment , 2020 .

[18]  Jae-Bok Song,et al.  Path Planning for a Robot Manipulator based on Probabilistic Roadmap and Reinforcement Learning , 2007 .

[19]  Lydia Tapia,et al.  Automated aerial suspended cargo delivery through reinforcement learning , 2017, Artif. Intell..

[20]  Wolfram Burgard,et al.  OctoMap: an efficient probabilistic 3D mapping framework based on octrees , 2013, Autonomous Robots.

[21]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[22]  Fumiyuki Adachi,et al.  Deep Reinforcement Learning for UAV Navigation Through Massive MIMO Technique , 2019, IEEE Transactions on Vehicular Technology.

[23]  Tosiyasu L. Kunii,et al.  Octree-Related Data Structures and Algorithms , 1984, IEEE Computer Graphics and Applications.

[24]  Nimish Sanghi Markov Decision Processes , 2021 .

[25]  Andres Hernandez,et al.  Identification and path following control of an AR.Drone quadrotor , 2013, 2013 17th International Conference on System Theory, Control and Computing (ICSTCC).

[26]  S. Bennett,et al.  Development of the PID controller , 1993, IEEE Control Systems.

[27]  Marwan Mattar,et al.  Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[28]  Lydia E. Kavraki,et al.  Probabilistic roadmaps for path planning in high-dimensional configuration spaces , 1996, IEEE Trans. Robotics Autom..

[29]  Darwin G. Caldwell,et al.  Reinforcement Learning in Robotics: Applications and Real-World Challenges , 2013, Robotics.