A path planning strategy unified with a COLREGS collision avoidance function based on deep reinforcement learning and artificial potential field

Abstract Improving the autopilot capability of ships is particularly important to ensure the safety of maritime navigation.The unmanned surface vessel (USV) with autopilot capability is a development trend of the ship of the future. The objective of this paper is to investigate the path planning problem of USVs in uncertain environments, and a path planning strategy unified with a collision avoidance function based on deep reinforcement learning (DRL) is proposed. A Deep Q-learning network (DQN) is used to continuously interact with the visually simulated environment to obtain experience data, so that the agent learns the best action strategies in the visual simulated environment. To solve the collision avoidance problems that may occur during USV navigation, the location of the obstacle ship is divided into four collision avoidance zones according to the International Regulations for Preventing Collisions at Sea (COLREGS). To obtain an improved DRL algorithm, the artificial potential field (APF) algorithm is utilized to improve the action space and reward function of the DQN algorithm. A simulation experiments is utilized to test the effects of our method in various situations. It is also shown that the enhanced DRL can effectively realize autonomous collision avoidance path planning.

[1]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[2]  Weidong Zhang,et al.  Active disturbance rejection controller design for dynamically positioned vessels based on adaptive hybrid biogeography-based optimization and differential evolution. , 2018, ISA transactions.

[3]  Rafal Szlapczynski,et al.  Review of ship safety domains: Models and applications , 2017 .

[4]  Bo Zhao,et al.  Local Path Planning for Unmanned Surface Vehicle Collision Avoidance Based on Modified Quantum Particle Swarm Optimization , 2020, Complex..

[5]  Xiumin Chu,et al.  Ship predictive collision avoidance method based on an improved beetle antennae search algorithm , 2019, Ocean Engineering.

[6]  Yun Li,et al.  Self-Adaptive Dynamic Obstacle Avoidance and Path Planning for USV Under Complex Maritime Environment , 2019, IEEE Access.

[7]  Xin Wang,et al.  The ship maneuverability based collision avoidance dynamic support system in close-quarters situation , 2017 .

[8]  Wasif Naeem,et al.  A Rule-based Heuristic Method for COLREGS-compliant Collision Avoidance for an Unmanned Surface Vehicle , 2012 .

[9]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[10]  Wang Peng,et al.  An obstacle avoidance strategy for the wave glider based on the improved artificial potential field and collision prediction model , 2020, Ocean Engineering.

[11]  Tulay Yildirim,et al.  COLREGS Based Path Planning and Bearing Only Obstacle Avoidance for Autonomous Unmanned Surface Vehicles , 2018 .

[12]  John Thangarajah,et al.  Deriving Subgoals Autonomously to Accelerate Learning in Sparse Reward Domains , 2019, AAAI.

[13]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[14]  Nakwan Kim,et al.  Collision avoidance for an unmanned surface vehicle using deep reinforcement learning , 2020 .

[15]  Shuanghe Yu,et al.  An Enhanced Fuzzy Control Strategy for Low-Level Thrusters in Marine Dynamic Positioning Systems Based on Chaotic Random Distribution Harmony Search , 2020 .

[16]  Yong Yin,et al.  COLREGS-Constrained Real-time Path Planning for Autonomous Ships Using Modified Artificial Potential Fields , 2018, Journal of Navigation.

[17]  T. Coldwell Marine Traffic Behaviour in Restricted Waters , 1983, Journal of Navigation.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Xiuguo Zhang,et al.  An Autonomous Path Planning Model for Unmanned Ships Based on Deep Reinforcement Learning , 2020, Sensors.

[20]  Chen Chen,et al.  A knowledge-free path planning approach for smart ships based on reinforcement learning , 2019, Ocean Engineering.

[21]  Maria Riveiro,et al.  A novel analytic framework of real-time multi-vessel collision risk assessment for maritime traffic surveillance , 2017 .

[22]  Defeng Wu,et al.  Design of UDE-based dynamic surface control for dynamic positioning of vessels with complex disturbances and input constraints , 2021 .

[23]  Yuanchang Liu,et al.  Smoothed A* algorithm for practical unmanned surface vehicle path planning , 2019, Applied Ocean Research.

[24]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[25]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[26]  Defeng Wu,et al.  A Path-Planning Strategy for Unmanned Surface Vehicles Based on an Adaptive Hybrid Dynamic Stepsize and Target Attractive Force-RRT Algorithm , 2019, Journal of Marine Science and Engineering.

[27]  Myung-Il Roh,et al.  COLREGs-compliant multiship collision avoidance based on deep reinforcement learning , 2019, Ocean Engineering.

[28]  Yibin Li,et al.  A∗ algorithm of global path planning based on the grid map and V-graph environmental model for the mobile robot , 2017, 2017 Chinese Automation Congress (CAC).

[29]  Yong Ma,et al.  Single-parameter-learning-based finite-time tracking control of underactuated MSVs under input saturation , 2020 .

[30]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[31]  Chenglong Wang,et al.  Energy-efficient Path Planning and Control Approach of USV Based on Particle Swarm optimization , 2018, OCEANS 2018 MTS/IEEE Charleston.

[32]  A. Lazarowska,et al.  Discrete Artificial Potential Field Approach to Mobile Robot Path Planning , 2019, IFAC-PapersOnLine.

[33]  Shinsuke Shimojo,et al.  Neural Computations Underlying Arbitration between Model-Based and Model-free Learning , 2013, Neuron.

[34]  Gheorghe Mogan,et al.  Neural networks based reinforcement learning for mobile robots obstacle avoidance , 2016, Expert Syst. Appl..

[35]  Elisabeth M. Goodwin,et al.  A Statistical Study of Ship Domains , 1973, Journal of Navigation.

[36]  Quanmin Zhu,et al.  Identification for fractional order rational models based on particle swarm optimisation , 2011, Int. J. Comput. Appl. Technol..

[37]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[38]  Michel Tokic Adaptive ε-greedy Exploration in Reinforcement Learning Based on Value Differences , 2010 .

[39]  S. G. Ponnambalam,et al.  Reinforcement learning: exploration–exploitation dilemma in multi-agent foraging task , 2012 .

[40]  Thor I. Fossen,et al.  Path planning and collision avoidance for autonomous surface vehicles II: a comparative study of algorithms , 2021, Journal of Marine Science and Technology.

[41]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[42]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.