论文信息 - Bat Q-learning Algorithm

Bat Q-learning Algorithm

Cooperative Q-learning approach allows multiple learners to learn independently and then share their Q-values among each other using a Q-value sharing strategy. A main problem with this approach is that the solutions of the learners may not converge to optimality, because the optimal Q-values may not be found. Another problem is that some cooperative algorithms perform very well with single-task problems, but quite poorly with multi-task problems. This paper proposes a new cooperative Q-learning algorithm called the Bat Q-learning algorithm (BQ-learning) that implements a Q-value sharing strategy based on the Bat algorithm. The Bat algorithm is a powerful optimization algorithm that increases the possibility of finding the optimal Q-values by balancing between the exploration and exploitation of actions by tuning the parameters of the algorithm. The BQ-learning algorithm was tested using two problems: the shortest path problem (single-task problem) and the taxi problem (multi-task problem). The experimental results suggest that BQ-learning performs better than single-agent Q-learning and some well-known cooperative Q-learning algorithms.

Bilal H. Abed-alguni

[1] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[2] Y. Kuroe,et al. Reinforcement Learning through Interaction among Multiple Agents , 2006, 2006 SICE-ICASE International Joint Conference.

[3] Bilal H. Abed-alguni,et al. A multi-agent cooperative reinforcement learning model using a hierarchy of consultants, tutors and workers , 2015, Vietnam Journal of Computer Science.

[4] Yasuaki Kuroe,et al. Swarm reinforcement learning method based on ant colony optimization , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[5] Maziar Palhang,et al. Multi-criteria expertness based cooperative Q-learning , 2012, Applied Intelligence.

[6] Riccardo Poli,et al. Particle swarm optimization , 1995, Swarm Intelligence.

[7] Bilal H. Abed-alguni,et al. A Comparison Study of Cooperative Q-learning Algorithms for Independent Learners , 2016 .

[8] Xin-She Yang,et al. Bat algorithm: a novel approach for global engineering optimization , 2012, 1211.6663.

[9] Yong Cao,et al. Non-reciprocating Sharing Methods in Cooperative Q-Learning Environments , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[10] Xin-She Yang,et al. Bat algorithm for multi-objective optimisation , 2011, Int. J. Bio Inspired Comput..

[11] Bernhard Hengst,et al. Model Approximation for HEXQ Hierarchical Reinforcement Learning , 2004, ECML.

[12] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[13] Y. Kuroe,et al. Swarm reinforcement learning algorithms based on Sarsa method , 2008, 2008 SICE Annual Conference.

[14] Yasuaki Kuroe,et al. Swarm reinforcement learning methods for problems with continuous state-action space , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[15] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[18] Xin-She Yang,et al. A New Metaheuristic Bat-Inspired Algorithm , 2010, NICSO.

[19] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[20] Tamer Ölmez,et al. A novel state space representation for the solution of 2D-HP protein folding problem using reinforcement learning methods , 2015, Appl. Soft Comput..

[21] Alcherio Martinoli,et al. A comparison of PSO and Reinforcement Learning for multi-robot obstacle avoidance , 2013, 2013 IEEE Congress on Evolutionary Computation.

[22] Majid Nili Ahmadabadi,et al. Expertness based cooperative Q-learning , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[23] T. Oates,et al. Locking in Returns : Speeding Up Q-Learning by Scaling , 2011 .

[24] Majid Nili Ahmadabadi,et al. Knowledge-based Extraction of Area of Expertise for Cooperation in Learning , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25] Kao-Shing Hwang,et al. Model Learning and Knowledge Sharing for a Multiagent System With Dyna-Q Learning , 2015, IEEE Transactions on Cybernetics.

[26] Darwin G. Caldwell,et al. Reinforcement Learning in Robotics: Applications and Real-World Challenges , 2013, Robotics.

[27] Y. Kuroe,et al. Swarm reinforcement learning algorithms -exchange of information among multiple agents- , 2007, SICE Annual Conference 2007.