Bat Q-learning Algorithm

Cooperative Q-learning approach allows multiple learners to learn independently and then share their Q-values among each other using a Q-value sharing strategy. A main problem with this approach is that the solutions of the learners may not converge to optimality, because the optimal Q-values may not be found. Another problem is that some cooperative algorithms perform very well with single-task problems, but quite poorly with multi-task problems. This paper proposes a new cooperative Q-learning algorithm called the Bat Q-learning algorithm (BQ-learning) that implements a Q-value sharing strategy based on the Bat algorithm. The Bat algorithm is a powerful optimization algorithm that increases the possibility of finding the optimal Q-values by balancing between the exploration and exploitation of actions by tuning the parameters of the algorithm. The BQ-learning algorithm was tested using two problems: the shortest path problem (single-task problem) and the taxi problem (multi-task problem). The experimental results suggest that BQ-learning performs better than single-agent Q-learning and some well-known cooperative Q-learning algorithms.

[1]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[2]  Y. Kuroe,et al.  Reinforcement Learning through Interaction among Multiple Agents , 2006, 2006 SICE-ICASE International Joint Conference.

[3]  Bilal H. Abed-alguni,et al.  A multi-agent cooperative reinforcement learning model using a hierarchy of consultants, tutors and workers , 2015, Vietnam Journal of Computer Science.

[4]  Yasuaki Kuroe,et al.  Swarm reinforcement learning method based on ant colony optimization , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[5]  Maziar Palhang,et al.  Multi-criteria expertness based cooperative Q-learning , 2012, Applied Intelligence.

[6]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[7]  Bilal H. Abed-alguni,et al.  A Comparison Study of Cooperative Q-learning Algorithms for Independent Learners , 2016 .

[8]  Xin-She Yang,et al.  Bat algorithm: a novel approach for global engineering optimization , 2012, 1211.6663.

[9]  Yong Cao,et al.  Non-reciprocating Sharing Methods in Cooperative Q-Learning Environments , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[10]  Xin-She Yang,et al.  Bat algorithm for multi-objective optimisation , 2011, Int. J. Bio Inspired Comput..

[11]  Bernhard Hengst,et al.  Model Approximation for HEXQ Hierarchical Reinforcement Learning , 2004, ECML.

[12]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[13]  Y. Kuroe,et al.  Swarm reinforcement learning algorithms based on Sarsa method , 2008, 2008 SICE Annual Conference.

[14]  Yasuaki Kuroe,et al.  Swarm reinforcement learning methods for problems with continuous state-action space , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[15]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[18]  Xin-She Yang,et al.  A New Metaheuristic Bat-Inspired Algorithm , 2010, NICSO.

[19]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[20]  Tamer Ölmez,et al.  A novel state space representation for the solution of 2D-HP protein folding problem using reinforcement learning methods , 2015, Appl. Soft Comput..

[21]  Alcherio Martinoli,et al.  A comparison of PSO and Reinforcement Learning for multi-robot obstacle avoidance , 2013, 2013 IEEE Congress on Evolutionary Computation.

[22]  Majid Nili Ahmadabadi,et al.  Expertness based cooperative Q-learning , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[23]  T. Oates,et al.  Locking in Returns : Speeding Up Q-Learning by Scaling , 2011 .

[24]  Majid Nili Ahmadabadi,et al.  Knowledge-based Extraction of Area of Expertise for Cooperation in Learning , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Kao-Shing Hwang,et al.  Model Learning and Knowledge Sharing for a Multiagent System With Dyna-Q Learning , 2015, IEEE Transactions on Cybernetics.

[26]  Darwin G. Caldwell,et al.  Reinforcement Learning in Robotics: Applications and Real-World Challenges , 2013, Robotics.

[27]  Y. Kuroe,et al.  Swarm reinforcement learning algorithms -exchange of information among multiple agents- , 2007, SICE Annual Conference 2007.