Reinforcement Learning-Based Multi-AUV Adaptive Trajectory Planning for Under-Ice Field Estimation

This work studies online learning-based trajectory planning for multiple autonomous underwater vehicles (AUVs) to estimate a water parameter field of interest in the under-ice environment. A centralized system is considered, where several fixed access points on the ice layer are introduced as gateways for communications between the AUVs and a remote data fusion center. We model the water parameter field of interest as a Gaussian process with unknown hyper-parameters. The AUV trajectories for sampling are determined on an epoch-by-epoch basis. At the end of each epoch, the access points relay the observed field samples from all the AUVs to the fusion center, which computes the posterior distribution of the field based on the Gaussian process regression and estimates the field hyper-parameters. The optimal trajectories of all the AUVs in the next epoch are determined to maximize a long-term reward that is defined based on the field uncertainty reduction and the AUV mobility cost, subject to the kinematics constraint, the communication constraint and the sensing area constraint. We formulate the adaptive trajectory planning problem as a Markov decision process (MDP). A reinforcement learning-based online learning algorithm is designed to determine the optimal AUV trajectories in a constrained continuous space. Simulation results show that the proposed learning-based trajectory planning algorithm has performance similar to a benchmark method that assumes perfect knowledge of the field hyper-parameters.

[1]  L. Freitag,et al.  Under-ice operations with a REMUS-100 AUV in the Arctic , 2010, 2010 IEEE/OES Autonomous Underwater Vehicles.

[2]  N.M. Patrikalakis,et al.  Path Planning of Autonomous Underwater Vehicles for Adaptive Sampling Using Mixed Integer Linear Programming , 2008, IEEE Journal of Oceanic Engineering.

[3]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[4]  Gaurav S. Sukhatme,et al.  Multi-robot coordination through dynamic Voronoi partitioning for informative adaptive sampling in communication-constrained environments , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Nando de Freitas,et al.  A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot , 2009, Auton. Robots.

[6]  António Manuel Santos Pascoal,et al.  A Decentralized Strategy for Multirobot Sampling/Patrolling: Theory and Experiments , 2015, IEEE Transactions on Control Systems Technology.

[7]  Stelios P. Mertikas,et al.  ERROR DISTRIBUTIONS AND ACCURACY MEASURES IN NAVIGATION : AN OVERVIEW , 2009 .

[8]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[9]  Dirk P. Kroese,et al.  Spatial Process Simulation , 2015 .

[10]  R. Bellman A Markovian Decision Process , 1957 .

[11]  François Le Gall,et al.  Powers of tensors and fast matrix multiplication , 2014, ISSAC.

[12]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[13]  H. Thomas,et al.  MBARI mapping AUV operations in the gulf of California 2015 , 2015, OCEANS 2015 - MTS/IEEE Washington.

[14]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[15]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[16]  Yunong Zhang,et al.  O(N 2)-Operation Approximation of Covariance Matrix Inverse in Gaussian Process Regression Based on Quasi-Newton BFGS Method , 2007, Commun. Stat. Simul. Comput..

[17]  Gamini Dissanayake,et al.  Information-Driven Adaptive Sampling Strategy for Mobile Robotic Wireless Sensor Network , 2016, IEEE Transactions on Control Systems Technology.

[18]  Fumin Zhang,et al.  Trend and Bounds for Error Growth in Controlled Lagrangian Particle Tracking , 2014, IEEE Journal of Oceanic Engineering.

[19]  Andreas Krause,et al.  Nonmyopic Adaptive Informative Path Planning for Multiple Robots , 2009, IJCAI.

[20]  Neil Bose,et al.  Adaptive Autonomous Underwater Vehicles: An Assessment of Their Effectiveness for Oceanographic Applications , 2019, IEEE Transactions on Engineering Management.

[21]  Simon X. Yang,et al.  Dynamic Task Assignment and Path Planning of Multi-AUV System Based on an Improved Self-Organizing Map and Velocity Synthesis Method in Three-Dimensional Underwater Workspace , 2013, IEEE Transactions on Cybernetics.

[22]  Gianluca Antonelli,et al.  Experimental results of coordinated sampling/patrolling by autonomous underwater vehicles , 2013, 2013 IEEE International Conference on Robotics and Automation.

[23]  Roger Skjetne,et al.  Using Autonomous Underwater Vehicles as Sensor Platforms for Ice-Monitoring , 2014 .

[24]  H. Thomas,et al.  Performance of an AUV navigation system at Arctic latitudes , 2005, IEEE Journal of Oceanic Engineering.

[25]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[26]  武藤 佳恭 Neural network parallel computing , 1992 .

[27]  Fabio Tozeto Ramos,et al.  Bayesian Optimisation for informative continuous path planning , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Scott Sanner,et al.  Sequential Bayesian Optimisation for Spatial-Temporal Monitoring , 2014, UAI.

[29]  Fabio Tozeto Ramos,et al.  Sequential Bayesian optimization as a POMDP for environment monitoring with UAVs , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Gaurav S. Sukhatme,et al.  Informative path planning for an autonomous underwater vehicle , 2010, 2010 IEEE International Conference on Robotics and Automation.

[31]  Andreas Willig,et al.  Protocols and Architectures for Wireless Sensor Networks , 2005 .

[32]  Hans Thomas,et al.  MBARI Dorado AUV's scientific results , 2013, 2013 OCEANS - San Diego.

[33]  D. Caress,et al.  MBARI mapping AUV operations: In the Gulf of California , 2012, 2012 Oceans.

[34]  Jongeun Choi,et al.  Mobile Sensor Network Navigation Using Gaussian Processes With Truncated Observations , 2011, IEEE Transactions on Robotics.

[35]  Geoffrey A. Hollinger,et al.  Uncertainty-driven view planning for underwater inspection , 2012, 2012 IEEE International Conference on Robotics and Automation.

[36]  Naomi Ehrich Leonard,et al.  Collective Motion, Sensor Networks, and Ocean Sampling , 2007, Proceedings of the IEEE.

[37]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.