论文信息 - Multi-objective reinforcement learning for AUV thruster failure recovery

Multi-objective reinforcement learning for AUV thruster failure recovery

This paper investigates learning approaches for discovering fault-tolerant control policies to overcome thruster failures in Autonomous Underwater Vehicles (AUV). The proposed approach is a model-based direct policy search that learns on an on-board simulated model of the vehicle. When a fault is detected and isolated the model of the AUV is reconfigured according to the new condition. To discover a set of optimal solutions a multi-objective reinforcement learning approach is employed which can deal with multiple conflicting objectives. Each optimal solution can be used to generate a trajectory that is able to navigate the AUV towards a specified target while satisfying multiple objectives. The discovered policies are executed on the robot in a closed-loop using AUV's state feedback. Unlike most existing methods which disregard the faulty thruster, our approach can also deal with partially broken thrusters to increase the persistent autonomy of the AUV. In addition, the proposed approach is applicable when the AUV either becomes under-actuated or remains redundant in the presence of a fault. We validate the proposed approach on the model of the Girona500 AUV.

[1] B. Anbarasu,et al. Multi-Objective Differential Evolution (MODE): An Evolutionary Algorithm for Multi-Objective Optimization Problems (MOOPs) , 2005 .

[2] Nilanjan Sarkar,et al. Fault-tolerant control of an autonomous underwater vehicle under thruster redundancy , 2001, Robotics Auton. Syst..

[3] Mae L. Seto,et al. An agent to optimally re-distribute control in an underactuated AUV , 2011, Int. J. Intell. Def. Support Syst..

[4] Konkoly Thege. Multi-criteria Reinforcement Learning , 1998 .

[5] Matteo Leonetti,et al. On-line learning to recover from thruster failures on Autonomous Underwater Vehicles , 2013, 2013 OCEANS - San Diego.

[6] Shie Mannor,et al. A Geometric Approach to Multi-Criterion Reinforcement Learning , 2004, J. Mach. Learn. Res..

[7] Darwin G. Caldwell,et al. On-line identification of autonomous underwater vehicles through global derivative-free optimization , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8] Matteo Leonetti,et al. Online Direct Policy Search for Thruster Failure Recovery in Autonomous Underwater Vehicles , 2013, ECAL 2013.

[9] R. Bono,et al. EXPERIENCES ON ACTUATOR FAULT DETECTION , DIAGNOSIS AND ACCOMMODATION FOR ROVS , .

[10] Gianluca Antonelli. A Survey of Fault Detection/Tolerance Strategies for AUVs and ROVs , 2003 .

[11] Csaba Szepesvári,et al. Multi-criteria Reinforcement Learning , 1998, ICML.

[12] Hayato Kondo,et al. On fault-tolerant control of a hovering AUV with four horizontal and two vertical thrusters , 2010, OCEANS'10 IEEE SYDNEY.

[13] Sergio Grammatico,et al. Geometric control for autonomous underwater vehicles: Overcoming a thruster failure , 2010, 49th IEEE Conference on Decision and Control (CDC).

[14] Sriraam Natarajan,et al. Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.

[15] Rainer Storn,et al. Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[16] David M. Lane,et al. Fault diagnosis on autonomous robotic vehicles with RECOVERY: an integrated heterogeneous-knowledge approach , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[17] Nilanjan Sarkar,et al. Fault tolerant control of an autonomous underwater vehicle under thruster redundancy: simulations and experiments , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[18] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[19] Qingfu Zhang,et al. A Multiobjective Differential Evolution Based on Decomposition for Multiobjective Optimization with Variable Linkages , 2006, PPSN.

[20] Qingfu Zhang,et al. Multiobjective evolutionary algorithms: A survey of the state of the art , 2011, Swarm Evol. Comput..

[21] B. Babu,et al. Differential evolution for multi-objective optimization , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[22] Christian R. Shelton,et al. Importance sampling for reinforcement learning with multiple objectives , 2001 .

[23] Susan A. Murphy,et al. Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis , 2010, ICML.

[24] Darwin G. Caldwell,et al. Online discovery of AUV control policies to overcome thruster failures , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[25] John Yearwood,et al. On the Limitations of Scalarisation for Multi-objective Reinforcement Learning of Pareto Fronts , 2008, Australasian Conference on Artificial Intelligence.

[26] Antoine Cully,et al. Abstract of: "Fast Damage Recovery in Robotics with the T-Resilience Algorithm" , 2018, ALIFE.

[27] Marc Carreras,et al. Girona 500 AUV: From Survey to Intervention , 2012, IEEE/ASME Transactions on Mechatronics.

[28] H. Abbass,et al. PDE: a Pareto-frontier differential evolution approach for multi-objective optimization problems , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[29] Srini Narayanan,et al. Learning all optimal policies with multiple criteria , 2008, ICML '08.

[30] D. B. Talange,et al. Fault tolerant control for Autonomous Underwater Vehicle , 2014, 2014 IEEE International Conference on Mechatronics and Automation.