论文信息 - Taming an Autonomous Surface Vehicle for Path Following and Collision Avoidance Using Deep Reinforcement Learning

Taming an Autonomous Surface Vehicle for Path Following and Collision Avoidance Using Deep Reinforcement Learning

In this article, we explore the feasibility of applying proximal policy optimization, a state-of-the-art deep reinforcement learning algorithm for continuous control tasks, on the dual-objective problem of controlling an underactuated autonomous surface vehicle to follow an a priori known path while avoiding collisions with non-moving obstacles along the way. The AI agent, which is equipped with multiple rangefinder sensors for obstacle detection, is trained and evaluated in a challenging, stochastically generated simulation environment based on the OpenAI gym Python toolkit. Notably, the agent is provided with real-time insight into its own reward function, allowing it to dynamically adapt its guidance strategy. Depending on its strategy, which ranges from radical path-adherence to radical obstacle avoidance, the trained agent achieves an episodic success rate close to 100%

[1] Øivind Aleksander G. Loe. Collision Avoidance for Unmanned Surface Vehicles , 2008 .

[2] Tor Arne Johansen,et al. Proactive Collision Avoidance for ASVs using A Dynamic Reciprocal Velocity Obstacles Method , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3] John F. Canny,et al. New lower bound techniques for robot motion planning problems , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[4] Luis Moreno,et al. Path Planning for Mobile Robot Navigation using Voronoi Diagram and Fast Marching , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[6] John E. R. Staddon,et al. The dynamics of behavior: Review of Sutton and Barto: Reinforcement Learning : An Introduction (2 nd ed.) , 2020 .

[7] Anastasios M. Lekkas,et al. Hybrid Collision Avoidance for ASVs Compliant With COLREGs Rules 8 and 13–17 , 2019, Frontiers in Robotics and AI.

[8] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[9] John Schulman,et al. Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs , 2016 .

[10] T. Fossen,et al. Continuous Curvature Path Planning using Voronoi diagrams and Fermat's spirals , 2013 .

[11] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[12] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[13] Martin Syre Wiig,et al. Collision Avoidance and Path Following for Underactuated Marine Vehicles , 2019 .

[14] S. LaValle. Rapidly-exploring random trees : a new tool for path planning , 1998 .

[15] Erwin Fehlberg,et al. Klassische Runge-Kutta-Formeln vierter und niedrigerer Ordnung mit Schrittweiten-Kontrolle und ihre Anwendung auf Wärmeleitungsprobleme , 1970, Computing.

[16] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[17] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[20] Leigh McCue,et al. Handbook of Marine Craft Hydrodynamics and Motion Control [Bookshelf] , 2016, IEEE Control Systems.

[21] Joel Nothman,et al. SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[22] Anastasios M. Lekkas,et al. Energy-Optimized Path Planning for Autonomous Ferries , 2018 .

[23] Thor I. Fossen,et al. Guidance Laws for Autonomous Underwater Vehicles , 2009 .

[24] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.

[25] Huei Peng,et al. Obstacle Avoidance for Low-Speed Autonomous Vehicles With Barrier Function , 2018, IEEE Transactions on Control Systems Technology.

[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[28] B. Faverjon,et al. Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[29] Yoram Koren,et al. The vector field histogram-fast obstacle avoidance for mobile robots , 1991, IEEE Trans. Robotics Autom..

[30] Paolo Fiorini,et al. Motion Planning in Dynamic Environments Using Velocity Obstacles , 1998, Int. J. Robotics Res..

[31] Morten Breivik,et al. The branching‐course model predictive control algorithm for maritime collision avoidance , 2019, J. Field Robotics.

[32] Oussama Khatib,et al. Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[33] Morten Breivik,et al. Hybrid Collision Avoidance for Autonomous Surface Vehicles , 2018 .

[34] Timothy W. McLain,et al. Small Unmanned Aircraft: Theory and Practice , 2012 .

[35] Zheping Yan,et al. Obstacle Avoidance for Unmanned Undersea Vehicle in Unknown Unstructured Environment , 2013 .

[36] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[37] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[38] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[39] Roger Skjetne,et al. Modeling, identification, and adaptive maneuvering of CyberShip II: A complete design with experiments , 2004 .

[40] James Bergstra,et al. Benchmarking Reinforcement Learning Algorithms on Real-World Robots , 2018, CoRL.

[41] Andreas B. Martinsen,et al. End-to-end training for path following and control of marine vehicles , 2018 .

[42] Kristin Ytterstad Pettersen,et al. A modified dynamic window algorithm for horizontal collision avoidance for AUVs , 2016, 2016 IEEE Conference on Control Applications (CCA).

[43] Tor Arne Johansen,et al. MPC-based Collision Avoidance Strategy for Existing Marine Vessel Guidance Systems , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[44] Wolfram Burgard,et al. A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation , 2016 .

[45] Alexandre M. Bayen,et al. A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games , 2005, IEEE Transactions on Automatic Control.

[46] Wolfram Burgard,et al. The dynamic window approach to collision avoidance , 1997, IEEE Robotics Autom. Mag..

[47] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[48] Yoram Koren,et al. Potential field methods and their inherent limitations for mobile robot navigation , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[49] Oliver Brock,et al. High-speed navigation using the global dynamic window approach , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[50] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[51] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[52] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[53] Dimitra Panagou,et al. Motion planning and collision avoidance using navigation vector fields , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[54] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.

[55] Nils J. Nilsson,et al. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..