Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle

Autonomous underwater vehicle (AUV) plays an increasingly important role in ocean exploration. Existing AUVs are usually not fully autonomous and generally limited to pre-planning or pre-programming tasks. Reinforcement learning (RL) and deep reinforcement learning have been introduced into the AUV design and research to improve its autonomy. However, these methods are still difficult to apply directly to the actual AUV system because of the sparse rewards and low learning efficiency. In this paper, we proposed a deep interactive reinforcement learning method for path following of AUV by combining the advantages of deep reinforcement learning and interactive RL. In addition, since the human trainer cannot provide human rewards for AUV when it is running in the ocean and AUV needs to adapt to a changing environment, we further propose a deep reinforcement learning method that learns from both human rewards and environmental rewards at the same time. We test our methods in two path following tasks—straight line and sinusoids curve following of AUV by simulating in the Gazebo platform. Our experimental results show that with our proposed deep interactive RL method, AUV can converge faster than a DQN learner from only environmental reward. Moreover, AUV learning with our deep RL from both human and environmental rewards can also achieve a similar or even better performance than that with deep interactive RL and can adapt to the actual environment by further learning from environmental rewards.

[1]  Bo He,et al.  Human-Centered Reinforcement Learning: A Survey , 2019, IEEE Transactions on Human-Machine Systems.

[2]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[3]  David L. Roberts,et al.  Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning , 2015, Autonomous Agents and Multi-Agent Systems.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[6]  Cynthia Breazeal,et al.  Training a Robot via Human Feedback: A Case Study , 2013, ICSR.

[7]  Yun Li,et al.  PID control system analysis, design, and technology , 2005, IEEE Transactions on Control Systems Technology.

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Guangliang Li,et al.  Socially intelligent autonomous agents that learn from human reward , 2016 .

[10]  Roger Skjetne,et al.  Line-of-sight path following of underactuated marine craft , 2003 .

[11]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[12]  Natàlia Hurtós,et al.  Learning by demonstration applied to underwater intervention , 2014, CCIA.

[13]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[14]  Shimon Whiteson,et al.  Using informative behavior to increase engagement while learning from human reward , 2015, Autonomous Agents and Multi-Agent Systems.

[15]  Robert Babuska,et al.  A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Axel van Lamsweerde,et al.  Learning machine learning , 1991 .

[17]  Yang Li,et al.  Adaptive Neural Network Control of AUVs With Control Input Nonlinearities Using Reinforcement Learning , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[18]  Darwin G. Caldwell,et al.  Autonomous robotic valve turning: A hierarchical learning approach , 2013, 2013 IEEE International Conference on Robotics and Automation.

[19]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[20]  Marc Carreras,et al.  Policy gradient based Reinforcement Learning for real autonomous underwater cable tracking , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[22]  W. Bradley Knox,et al.  Learning from human-generated reward , 2012 .

[23]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[24]  Junku Yuh,et al.  Learning control of underwater robotic vehicles , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[25]  Zhenyu Shi,et al.  Deep reinforcement learning based optimal trajectory tracking control of autonomous underwater vehicle , 2017, 2017 36th Chinese Control Conference (CCC).

[26]  Guan Wang,et al.  Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.

[27]  Marc Carreras,et al.  Towards valve turning with an AUV using Learning by Demonstration , 2013, 2013 MTS/IEEE OCEANS - Bergen.

[28]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[29]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[30]  Andres El-Fakdi,et al.  Behavior Adaptation by Means of Reinforcement Learning , 2013 .

[31]  Shimon Whiteson,et al.  Social interaction for efficient agent learning from human reward , 2017, Autonomous Agents and Multi-Agent Systems.

[32]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..