Optimization of fish-like locomotion using hierarchical reinforcement learning

With an interest in advanced marine propulsion systems, much research has been done on mimicking fish-like locomotion using flapping fins. This study aims to optimize the swimming pattern of fish-like locomotion based on hierarchical reinforcement learning. A simplified carangiform fish model is employed and a segmented tail motion is learned by Q-learning to maximize the average forward velocity by flapping the tail fin. The performance of the self-learned swimming pattern is verified and analyzed in terms of the flapping efficiency. The results show that the flapping angle limit of approximately 35 degrees is best in maximizing the forward moving velocity and the hierarchical reinforcement learning approach is effective in providing a reasonable solution for a large-scale problem.

[1]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[2]  Joel W. Burdick,et al.  Experiments in carangiform robotic fish locomotion , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[3]  M. Lighthill Note on the swimming of slender fish , 1960, Journal of Fluid Mechanics.

[4]  C. Breder The locomotion of fishes , 1926 .

[5]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[6]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[7]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[8]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..