Taming an Autonomous Surface Vehicle for Path Following and Collision Avoidance Using Deep Reinforcement Learning

In this article, we explore the feasibility of applying proximal policy optimization, a state-of-the-art deep reinforcement learning algorithm for continuous control tasks, on the dual-objective problem of controlling an underactuated autonomous surface vehicle to follow an a priori known path while avoiding collisions with non-moving obstacles along the way. The AI agent, which is equipped with multiple rangefinder sensors for obstacle detection, is trained and evaluated in a challenging, stochastically generated simulation environment based on the OpenAI gym Python toolkit. Notably, the agent is provided with real-time insight into its own reward function, allowing it to dynamically adapt its guidance strategy. Depending on its strategy, which ranges from radical path-adherence to radical obstacle avoidance, the trained agent achieves an episodic success rate close to 100%

[1]  Øivind Aleksander G. Loe Collision Avoidance for Unmanned Surface Vehicles , 2008 .

[2]  Tor Arne Johansen,et al.  Proactive Collision Avoidance for ASVs using A Dynamic Reciprocal Velocity Obstacles Method , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  John F. Canny,et al.  New lower bound techniques for robot motion planning problems , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[4]  Luis Moreno,et al.  Path Planning for Mobile Robot Navigation using Voronoi Diagram and Fast Marching , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[6]  John E. R. Staddon,et al.  The dynamics of behavior: Review of Sutton and Barto: Reinforcement Learning : An Introduction (2 nd ed.) , 2020 .

[7]  Anastasios M. Lekkas,et al.  Hybrid Collision Avoidance for ASVs Compliant With COLREGs Rules 8 and 13–17 , 2019, Frontiers in Robotics and AI.

[8]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[9]  John Schulman,et al.  Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs , 2016 .

[10]  T. Fossen,et al.  Continuous Curvature Path Planning using Voronoi diagrams and Fermat's spirals , 2013 .

[11]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[12]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[13]  Martin Syre Wiig,et al.  Collision Avoidance and Path Following for Underactuated Marine Vehicles , 2019 .

[14]  S. LaValle Rapidly-exploring random trees : a new tool for path planning , 1998 .

[15]  Erwin Fehlberg,et al.  Klassische Runge-Kutta-Formeln vierter und niedrigerer Ordnung mit Schrittweiten-Kontrolle und ihre Anwendung auf Wärmeleitungsprobleme , 1970, Computing.

[16]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[17]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[20]  Leigh McCue,et al.  Handbook of Marine Craft Hydrodynamics and Motion Control [Bookshelf] , 2016, IEEE Control Systems.

[21]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[22]  Anastasios M. Lekkas,et al.  Energy-Optimized Path Planning for Autonomous Ferries , 2018 .

[23]  Thor I. Fossen,et al.  Guidance Laws for Autonomous Underwater Vehicles , 2009 .

[24]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[25]  Huei Peng,et al.  Obstacle Avoidance for Low-Speed Autonomous Vehicles With Barrier Function , 2018, IEEE Transactions on Control Systems Technology.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[28]  B. Faverjon,et al.  Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[29]  Yoram Koren,et al.  The vector field histogram-fast obstacle avoidance for mobile robots , 1991, IEEE Trans. Robotics Autom..

[30]  Paolo Fiorini,et al.  Motion Planning in Dynamic Environments Using Velocity Obstacles , 1998, Int. J. Robotics Res..

[31]  Morten Breivik,et al.  The branching‐course model predictive control algorithm for maritime collision avoidance , 2019, J. Field Robotics.

[32]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[33]  Morten Breivik,et al.  Hybrid Collision Avoidance for Autonomous Surface Vehicles , 2018 .

[34]  Timothy W. McLain,et al.  Small Unmanned Aircraft: Theory and Practice , 2012 .

[35]  Zheping Yan,et al.  Obstacle Avoidance for Unmanned Undersea Vehicle in Unknown Unstructured Environment , 2013 .

[36]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[37]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[38]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[39]  Roger Skjetne,et al.  Modeling, identification, and adaptive maneuvering of CyberShip II: A complete design with experiments , 2004 .

[40]  James Bergstra,et al.  Benchmarking Reinforcement Learning Algorithms on Real-World Robots , 2018, CoRL.

[41]  Andreas B. Martinsen,et al.  End-to-end training for path following and control of marine vehicles , 2018 .

[42]  Kristin Ytterstad Pettersen,et al.  A modified dynamic window algorithm for horizontal collision avoidance for AUVs , 2016, 2016 IEEE Conference on Control Applications (CCA).

[43]  Tor Arne Johansen,et al.  MPC-based Collision Avoidance Strategy for Existing Marine Vessel Guidance Systems , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Wolfram Burgard,et al.  A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation , 2016 .

[45]  Alexandre M. Bayen,et al.  A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games , 2005, IEEE Transactions on Automatic Control.

[46]  Wolfram Burgard,et al.  The dynamic window approach to collision avoidance , 1997, IEEE Robotics Autom. Mag..

[47]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[48]  Yoram Koren,et al.  Potential field methods and their inherent limitations for mobile robot navigation , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[49]  Oliver Brock,et al.  High-speed navigation using the global dynamic window approach , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[50]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[51]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[52]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[53]  Dimitra Panagou,et al.  Motion planning and collision avoidance using navigation vector fields , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[54]  Lex Weaver,et al.  The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.

[55]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..