论文信息 - Multi Pseudo Q-Learning-Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

Multi Pseudo Q-Learning-Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

This paper investigates trajectory tracking problem for a class of underactuated autonomous underwater vehicles (AUVs) with unknown dynamics and constrained inputs. Different from existing policy gradient methods which employ single actor critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively. Specifically, for the critics, the expected absolute Bellman error-based updating rule is used to choose the worst critic to be updated in each time step. Subsequently, to calculate the loss function with more accurate target value for the chosen critic, Pseudo Q-learning, which uses subgreedy policy to replace the greedy policy in Q-learning, is developed for continuous action spaces, and Multi Pseudo Q-learning (MPQ) is proposed to reduce the overestimation of action-value function and to stabilize the learning. As for the actors, deterministic policy gradient is applied to update the weights, and the final learned policy is defined as the average of all actors to avoid large but bad updates. Moreover, the stability analysis of the learning is given qualitatively. The effectiveness and generality of the proposed MPQ-based deterministic policy gradient (MPQ-DPG) algorithm are verified by the application on AUV with two different reference trajectories. In addition, the results demonstrate high-level tracking control accuracy and stable learning of MPQ-DPG. Besides, the results also validate that increasing the number of the actors and critics will further improve the performance.

[1] Yoo Sang Choo,et al. Leader-follower formation control of underactuated autonomous underwater vehicles , 2010 .

[2] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[3] Marc Carreras,et al. Two-step gradient-based reinforcement learning for underwater robotics behavior learning , 2013, Robotics Auton. Syst..

[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5] Simon X. Yang,et al. Dynamic Task Assignment and Path Planning of Multi-AUV System Based on an Improved Self-Organizing Map and Velocity Synthesis Method in Three-Dimensional Underwater Workspace , 2013, IEEE Transactions on Cybernetics.

[6] Changyin Sun,et al. Adaptive Neural Network Control of Biped Robots , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[7] Jun Tan,et al. Parameterized Batch Reinforcement Learning for Longitudinal Control of Autonomous Land Vehicles , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[8] Wei Hu,et al. Exploring Deep Reinforcement Learning with Multi Q-Learning , 2016 .

[9] Renquan Lu,et al. Trajectory-Tracking Control of Mobile Robot Systems Incorporating Neural-Dynamic Optimized Model Predictive Approach , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[10] Roland Siegwart,et al. Control of a Quadrotor With Reinforcement Learning , 2017, IEEE Robotics and Automation Letters.

[11] Bo He,et al. Target following for an autonomous underwater vehicle using regularized ELM-based reinforcement learning , 2015, OCEANS 2015 - MTS/IEEE Washington.

[12] Leigh McCue,et al. Handbook of Marine Craft Hydrodynamics and Motion Control [Bookshelf] , 2016, IEEE Control Systems.

[13] Warren E. Dixon,et al. Model-based reinforcement learning for infinite-horizon approximate optimal tracking , 2014, 53rd IEEE Conference on Decision and Control.

[14] Wei He,et al. Reinforcement learning control of a single-link flexible robotic manipulator , 2017 .

[15] Changyin Sun,et al. Adaptive Neural Impedance Control of a Robotic Manipulator With Input Saturation , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[16] Timothy Prestero,et al. Verification of a six-degree of freedom simulation model for the REMUS autonomous underwater vehicle , 2001 .

[17] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[18] Bidyadhar Subudhi,et al. Cooperative formation control of autonomous underwater vehicles: An overview , 2016, Int. J. Autom. Comput..

[19] Alexander Zelinsky,et al. Reinforcement Learning applied to the control of an Autonomous Underwater Vehicle , 1999 .

[20] Ming Liu,et al. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[21] Weisheng Yan,et al. Mutual Information-Based Multi-AUV Path Planning for Scalar Field Sampling Using Multidimensional RRT* , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[22] Bing Chen,et al. Neural-Based Adaptive Output-Feedback Control for a Class of Nonstrict-Feedback Stochastic Nonlinear Systems , 2015, IEEE Transactions on Cybernetics.

[23] Shuzhi Sam Ge,et al. Vibration Control of a Flexible Beam With Output Constraint , 2015, IEEE Transactions on Industrial Electronics.

[24] Mahendra Pratap Singh,et al. Control of Autonomous Underwater Vehicles , 2011 .

[25] Frank L. Lewis,et al. Multiple Actor-Critic Structures for Continuous-Time Optimal Control Using Input-Output Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[26] Do Wan Kim,et al. Tracking of REMUS autonomous underwater vehicles with actuator saturations , 2015, Autom..

[27] Wettergreen Alexander Zelinsky. Reinforcement Learning applied to the control of an AutonolDOlls ·U nclerwater .Vehicle , 1998 .

[28] A. J. Healey,et al. Multivariable sliding mode control for autonomous diving and steering of unmanned underwater vehicles , 1993 .

[29] Zhouhua Peng,et al. Output-Feedback Path-Following Control of Autonomous Underwater Vehicles Based on an Extended State Observer and Projection Neural Networks , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[30] Bidyadhar Subudhi,et al. A static output feedback control design for path following of autonomous underwater vehicle in vertical plane , 2013 .

[31] Wei He,et al. Adaptive Neural Network Control of a Marine Vessel With Constraints Using the Asymmetric Barrier Lyapunov Function. , 2017, IEEE transactions on cybernetics.

[32] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33] Tommaso Mannucci,et al. Safe Exploration Algorithms for Reinforcement Learning Controllers , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[34] Haibo He,et al. Air-Breathing Hypersonic Vehicle Tracking Control Based on Adaptive Dynamic Programming , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[35] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[36] Etienne Perot,et al. Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[37] G. Uhlenbeck,et al. On the Theory of the Brownian Motion , 1930 .

[38] J. Batlle,et al. A behavior-based scheme using reinforcement learning for autonomous underwater vehicles , 2005, IEEE Journal of Oceanic Engineering.

[39] Qin Zhang,et al. Path-Following Control of an AUV: Fully Actuated Versus Under-actuated Configuration , 2016 .

[40] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[41] Gerardo G. Acosta,et al. Trajectory tracking algorithm for autonomous vehicles using adaptive reinforcement learning , 2015, OCEANS 2015 - MTS/IEEE Washington.

[42] Amit Konar,et al. A Deterministic Improved Q-Learning for Path Planning of a Mobile Robot , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[43] Wei He,et al. Adaptive Neural Network Control of an Uncertain Robot With Full-State Constraints , 2016, IEEE Transactions on Cybernetics.

[44] Asgeir J. Sørensen,et al. Model-Based Output Feedback Control of Slender-Body Underactuated AUVs: Theory and Experiments , 2008, IEEE Transactions on Control Systems Technology.