Reinforcement Learning-based Adaptive Trajectory Planning for AUVs in Under-ice Environments

This work studies online learning-based trajectory planning for multiple autonomous underwater vehicles (AUVs) to estimate a water parameter field of interest in the under-ice environment. A centralized system is considered, where several fixed access points (APs) on the ice layer are introduced as gateways for communications between the AUVs and a remote data fusion center (FC). We model the water parameter field of interest as a Gaussian process (GP) with unknown hyper-parameters. The AUV trajectories for sampling are determined on an epoch-by-epoch basis. At the end of each epoch, the APs relay the observed field samples from all the AUVs to the FC which computes the posterior distribution of the field based on the Gaussian process regression (GPR) and estimates the field hyper-parameters. The optimal trajectories of all the AUVs in the next epoch are determined to minimize a long-term cost that is defined based on the field uncertainty reduction and the AUV mobility cost, subject to the kinematics constraint, the communication range constraint and the sensing area constraint. We formulate the adaptive trajectory planning problem as a Markov decision process (MDP). A reinforcement learning (RL)-based online learning method is designed to determine the optimal AUV trajectories in a constrained continuous space. Simulation results show that the proposed learning-based trajectory planning algorithm has performance similar to a benchmark method that assumes perfect knowledge of the field hyper-parameters.

[1]  N.M. Patrikalakis,et al.  Path Planning of Autonomous Underwater Vehicles for Adaptive Sampling Using Mixed Integer Linear Programming , 2008, IEEE Journal of Oceanic Engineering.

[2]  Lee Freitag,et al.  Autonomous Underwater Vehicle Operations Beneath Coastal Sea Ice , 2012, IEEE/ASME Transactions on Mechatronics.

[3]  R. Bellman A Markovian Decision Process , 1957 .

[4]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[5]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[6]  Naomi Ehrich Leonard,et al.  Collective Motion, Sensor Networks, and Ocean Sampling , 2007, Proceedings of the IEEE.

[7]  Andreas Krause,et al.  Nonmyopic Adaptive Informative Path Planning for Multiple Robots , 2009, IJCAI.

[8]  Fumin Zhang,et al.  Trend and Bounds for Error Growth in Controlled Lagrangian Particle Tracking , 2014, IEEE Journal of Oceanic Engineering.

[9]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[10]  J. Ferguson Adapting AUVs for use in under-ice scientific missions , 2008, OCEANS 2008.

[11]  Nando de Freitas,et al.  A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot , 2009, Auton. Robots.

[12]  Roger Skjetne,et al.  Using Autonomous Underwater Vehicles as Sensor Platforms for Ice-Monitoring , 2014 .

[13]  Stelios P. Mertikas,et al.  ERROR DISTRIBUTIONS AND ACCURACY MEASURES IN NAVIGATION : AN OVERVIEW , 2009 .

[14]  António Manuel Santos Pascoal,et al.  A Decentralized Strategy for Multirobot Sampling/Patrolling: Theory and Experiments , 2015, IEEE Transactions on Control Systems Technology.

[15]  Simon X. Yang,et al.  Dynamic Task Assignment and Path Planning of Multi-AUV System Based on an Improved Self-Organizing Map and Velocity Synthesis Method in Three-Dimensional Underwater Workspace , 2013, IEEE Transactions on Cybernetics.

[16]  Gamini Dissanayake,et al.  Information-Driven Adaptive Sampling Strategy for Mobile Robotic Wireless Sensor Network , 2016, IEEE Transactions on Control Systems Technology.

[17]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[18]  L. Freitag,et al.  Under-ice operations with a REMUS-100 AUV in the Arctic , 2010, 2010 IEEE/OES Autonomous Underwater Vehicles.

[19]  Sarah E. Webster,et al.  Preliminary results in under-ice acoustic navigation for seagliders in Davis Strait , 2014, 2014 Oceans - St. John's.

[20]  Fabio Tozeto Ramos,et al.  Sequential Bayesian optimization as a POMDP for environment monitoring with UAVs , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[21]  J. Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[22]  Jongeun Choi,et al.  Mobile Sensor Network Navigation Using Gaussian Processes With Truncated Observations , 2011, IEEE Transactions on Robotics.