Deep reinforcement learning based optimal trajectory tracking control of autonomous underwater vehicle

The aim of this paper is to solve the control problem of trajectory tracking of Autonomous Underwater Vehicles (AUVs) through using and improving deep reinforcement learning (DRL). The deep reinforcement learning of an underwater motion control system is composed of two neural networks: one network selects action and the other evaluates whether the selected action is accurate, and they modify themselves through a deep deterministic policy gradient(DDPG). These two neural networks are made up of multiple fully connected layers. Based on theories and simulations, this algorithm is more accurate than traditional PID control in solving the trajectory tracking of AUV in complex curves to a certain precision.

[1]  Geoffrey I. Webb,et al.  # 2001 Kluwer Academic Publishers. Printed in the Netherlands. Machine Learning for User Modeling , 1999 .

[2]  Lu Wang,et al.  Horizontal tracking control for AUV based on nonlinear sliding mode , 2012, 2012 IEEE International Conference on Information and Automation.

[3]  Narcís Palomeras Rovira,et al.  Autonomous underwater vehicle control using reinforcement learning policy search methods , 2005 .

[4]  Yang Li,et al.  Neural network based reinforcement learning control of autonomous underwater vehicles with control input saturation , 2014, 2014 UKACC International Conference on Control (CONTROL).

[5]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Thor I. Fossen,et al.  Guidance and control of ocean vehicles , 1994 .

[8]  Wang Zhen-jia Intelligence Control System of Guided Bomb Based on ANFIS , 2003 .

[9]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[10]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[11]  Gerardo G. Acosta,et al.  Trajectory tracking algorithm for autonomous vehicles using adaptive reinforcement learning , 2015, OCEANS 2015 - MTS/IEEE Washington.

[12]  Anthony Jameson,et al.  User Modeling and User-Adapted Interaction , 2004, User Modeling and User-Adapted Interaction.

[13]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[14]  Kyoung Nam Ha,et al.  Way-point tracking for a hovering AUV by PID controller , 2015, 2015 15th International Conference on Control, Automation and Systems (ICCAS).

[15]  J. Manecius Selvakumar,et al.  Station keeping control of underwater robots using disturbance force measurements , 2016 .

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Satoshi Okada,et al.  Development of hovering control system for an underwater vehicle to perform core internal inspections , 2016 .

[18]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[19]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[20]  Balasaheb M. Patre,et al.  Task Space Control of an Autonomous Underwater Vehicle Manipulator System by Robust Single-Input Fuzzy Logic Control Scheme , 2017, IEEE Journal of Oceanic Engineering.

[21]  Mu Zhou,et al.  CFD Research on Underwater Vehicle Hydrodynamic Damping Force Coefficient , 2016 .

[22]  Andrew Y. Ng,et al.  Shaping and policy search in reinforcement learning , 2003 .

[23]  Li Juan,et al.  AUV control systems of nonlinear extended state observer design , 2014, 2014 IEEE International Conference on Mechatronics and Automation.

[24]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[25]  Nitish Srivastava,et al.  Improving Neural Networks with Dropout , 2013 .

[26]  Yoo Sang Choo,et al.  Leader-follower formation control of underactuated autonomous underwater vehicles , 2010 .