Path Planning of Robotic Fish in Unknown Environment with Improved Reinforcement Learning Algorithm

Path planning is the primary task for robotic fish, especially when the environment under water of robotic fish is unknown. The conventional reinforcement learning algorithms usually exhibit a poor convergence property in unknown environment. In order to find the optimal path and increase the convergence speed in the unknown environment, an improved reinforcement learning method utilizing a simulated annealing approach is proposed in robotic fish navigation. The simulated annealing policy with a novel cooling method rather than a general ɛ-greedy policy is taken for action choice. The algorithm convergence speed is improved by a novel reward function with goal-oriented strategy. Then the stopping condition of the proposed reinforcement learning algorithm is rectified as well. In this work, the robotic fish is designed and the prototype is produced by 3D printing technology. Then the proposed algorithm is examined in the 2D unpredictable environment to obtain greedy actions. Experimental results show that the proposed algorithms can generate an optimal path in unknown environment for robotic fish and increase the convergence speed as well as balance the exploration and exploitation.

[1]  J. A. Anderson,et al.  Talking Nets: An Oral History Of Neural Networks , 1998, IEEE Trans. Neural Networks.

[2]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[3]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[4]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[5]  Yunfeng Ai,et al.  Parallel reinforcement learning: a framework and case study , 2018, IEEE/CAA Journal of Automatica Sinica.

[6]  Jaeyeon Lee,et al.  A probability-based path planning method using fuzzy logic , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Yang Liu,et al.  Autonomous exploration for mobile robot using Q-learning , 2017, 2017 2nd International Conference on Advanced Robotics and Mechatronics (ICARM).

[8]  Amit Konar,et al.  A Deterministic Improved Q-Learning for Path Planning of a Mobile Robot , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  Dominique Gruyer,et al.  Modified artificial potential field method for online path planning applications , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[11]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[12]  Aníbal Ollero,et al.  Path planning based on Genetic Algorithms and the Monte-Carlo method to avoid aerial vehicle collisions under uncertainties , 2011, 2011 IEEE International Conference on Robotics and Automation.

[13]  Ivan Petrovic,et al.  A visibility graph based method for path planning in dynamic environments , 2011, 2011 Proceedings of the 34th International Convention MIPRO.

[14]  B. K. Panigrahi,et al.  Intelligent-based multi-robot path planning inspired by improved classical Q-learning and improved particle swarm optimization with perturbed velocity , 2016 .

[15]  Chaymaa Lamini,et al.  H-MAS architecture and reinforcement learning method for autonomous robot path planning , 2017, 2017 Intelligent Systems and Computer Vision (ISCV).

[16]  Joelle Pineau,et al.  Socially Adaptive Path Planning in Human Environments Using Inverse Reinforcement Learning , 2016, Int. J. Soc. Robotics.

[17]  Lu Haifeng,et al.  The improved potential grid method in robot path planning , 2009 .

[18]  Li Li,et al.  Traffic signal timing via deep reinforcement learning , 2016, IEEE/CAA Journal of Automatica Sinica.

[19]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[20]  Naoyuki Kubota,et al.  A neuro-based network for on-line topological map building and dynamic path planning , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[21]  Chaymaa Lamini,et al.  Collaborative Q-learning path planning for autonomous robots based on holonic multi-agent system , 2015, 2015 10th International Conference on Intelligent Systems: Theories and Applications (SITA).