Direct Policy Search for Thruster Failure Recovery in Autonomous Underwater Vehicles

Autonomous underwater vehicles are prone to various factors that may lead a mission to fail and cause unrecoverable damages. Even robust controllers cannot make sure that the robot is able to navigate to a safe location in such situations. In this paper we propose an online learning method for reconfiguring the controller, which tries to recover the robot and survive the mission using the current asset of the system. The proposed method is framed in the reinforcement learning setting, and in particular as a model-based direct policy search approach. Since learning on a damaged vehicle would be impossible owing to time and energy constraints, learning is performed on a model which is identified and kept updated online. We evaluate the applicability of our method with different policy representations and learning algorithms, on the model of the Girona500 autonomous underwater vehicle.

[1]  Abhijit S. Pandya,et al.  On-line learning control of autonomous underwater vehicles using feedforward neural networks , 1992 .

[2]  Matteo Leonetti,et al.  Combining Local and Global Direct Derivative-Free Optimization for Reinforcement Learning , 2012 .

[3]  Alexey Zhirabok,et al.  Observer based fault diagnosis in thrusters of autonomous underwater vehicle , 2010, 2010 Conference on Control and Fault-Tolerant Systems (SysTol).

[4]  Marco Sciandrone,et al.  On the Global Convergence of Derivative-Free Methods for Unconstrained Optimization , 2002, SIAM J. Optim..

[5]  Ignace Lemahieu,et al.  PET image reconstruction using simulated annealing , 1995, Medical Imaging.

[6]  Bruce E. Rosen,et al.  Genetic Algorithms and Very Fast Simulated Reannealing: A comparison , 1992 .

[7]  D. Liotard Algorithmic tools in the study of semiempirical potential surfaces , 1992 .

[8]  W. Price Global optimization by controlled random search , 1983 .

[9]  L. Ingber Very fast simulated re-annealing , 1989 .

[10]  Renato Seiji Tavares,et al.  Simulated annealing with adaptive neighborhood: A case study in off-line robot path planning , 2011, Expert Syst. Appl..

[11]  H. Martinez-Alfaro,et al.  Collision-free path planning for mobile robots and/or AGVs using simulated annealing , 1994, Proceedings of IEEE International Conference on Systems, Man and Cybernetics.

[12]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[13]  Sandro Ridella,et al.  Minimizing multimodal functions of continuous variables with the “simulated annealing” algorithmCorrigenda for this article is available here , 1987, TOMS.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Darwin G. Caldwell,et al.  On-line identification of autonomous underwater vehicles through global derivative-free optimization , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Rob A. Rutenbar,et al.  Simulated annealing algorithms: an overview , 1989, IEEE Circuits and Devices Magazine.

[17]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[18]  Gianluca Antonelli A Survey of Fault Detection/Tolerance Strategies for AUVs and ROVs , 2003 .

[19]  D.M. Lane,et al.  Adaptive mission plan diagnosis and repair for fault recovery in autonomous underwater vehicles , 2008, OCEANS 2008.

[20]  S. Dreyfus,et al.  Thermodynamical Approach to the Traveling Salesman Problem : An Efficient Simulation Algorithm , 2004 .

[21]  Lester Ingber,et al.  Trading markets with canonical momenta and adaptive simulated annealing , 1996 .

[22]  G.C. Verghese,et al.  A robust failure detection and isolation algorithm , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[23]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[24]  Alejandro Quintero,et al.  Assigning cells to switches in cellular mobile networks: a comparative study , 2003, Comput. Commun..

[25]  Tomoyuki Hiroyasu,et al.  Simulated annealing with advanced adaptive neighborhood , 2002 .

[26]  K. Rajan,et al.  Preliminary Results for Model-Based Adaptive Control of an Autonomous Underwater Vehicle , 2008, ISER.

[27]  Thor I. Fossen,et al.  Guidance and control of ocean vehicles , 1994 .

[28]  Gianni Di Pillo,et al.  A New Version of the Price's Algorithm for Global Optimization , 1997, J. Glob. Optim..