A novel modular Q-learning architecture to improve performance under incomplete learning in a grid soccer game

Multi-agent reinforcement learning methods suffer from several deficiencies that are rooted in the large state space of multi-agent environments. This paper tackles two deficiencies of multi-agent reinforcement learning methods: their slow learning rate, and low quality decision-making in early stages of learning. The proposed methods are applied in a grid-world soccer game. In the proposed approach, modular reinforcement learning is applied to reduce the state space of the learning agents from exponential to linear in terms of the number of agents. The modular model proposed here includes two new modules, a partial-module and a single-module. These two new modules are effective for increasing the speed of learning in a soccer game. We also apply the instance-based learning concepts, to choose proper actions in states that are not experienced adequately during learning. The key idea is to use neighbouring states that have been explored sufficiently during the learning phase. The results of experiments in a grid-soccer game environment show that our proposed methods produce a higher average reward compared to the situation where the proposed method is not applied to the modular structure.

[1]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[2]  H. JoséAntonioMartín,et al.  A distributed reinforcement learning control architecture for multi-link robots - experimental validation , 2007, ICINCO-ICSO.

[3]  Shin Ishii,et al.  Multi-agent reinforcement learning: an approach based on the other agent's internal model , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[4]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[5]  Yukinori Kakazu,et al.  An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning , 2003, Robotics Auton. Syst..

[6]  Claude F. Touzet,et al.  Neural reinforcement learning for behaviour synthesis , 1997, Robotics Auton. Syst..

[7]  Mitsuo Kawato,et al.  Inter-module credit assignment in modular reinforcement learning , 2003, Neural Networks.

[8]  András Lórincz,et al.  Hippocampal Formation Breaks Combinatorial Explosion for Reinforcement Learning: A Conjecture. , 2008 .

[9]  K. Tuyls,et al.  Reinforcement Learning in Large State Spaces , 2002, RoboCup.

[10]  Michael Mateas,et al.  On the Difficulty of Modular Reinforcement Learning for Real-World Partial Programming , 2006, AAAI.

[11]  Reda Alhajj,et al.  Fuzzy OLAP association rules mining-based modular reinforcement learning approach for multiagent systems , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Kenji Fukumoto,et al.  Multi-agent Reinforcement Learning: A Modular Approach , 1996 .

[13]  Jong-Hwan Kim,et al.  Modular Q-learning based multi-agent cooperation for robot soccer , 2001, Robotics Auton. Syst..

[14]  Davy Janssens,et al.  Simulation of sequential data: An enhanced reinforcement learning approach , 2009, Expert Syst. Appl..

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Jonas Karlsson,et al.  Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging , 1993 .

[17]  Kallirroi Georgila,et al.  Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets , 2008, CL.

[18]  Reda Alhajj,et al.  Modular fuzzy-reinforcement learning approach with internal model capabilities for multiagent systems , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Sridhar Mahadevan,et al.  Robot Learning , 1993 .

[20]  Karl Tuyls,et al.  Reinforcement learning in large state spaces: Simulated robotic soccer as a testbed , 2003 .

[21]  Faruk Polat,et al.  A Conflict Resolution-Based Decentralized Multi-Agent Problem Solving Model , 1992, MAAMAW.

[22]  Chen K. Tham,et al.  Reinforcement learning of multiple tasks using a hierarchical CMAC architecture , 1995, Robotics Auton. Syst..

[23]  Jing Li,et al.  Multi-goal Q-learning of cooperative teams , 2011, Expert Syst. Appl..

[24]  Yi-Chi Wang,et al.  Application of reinforcement learning for agent-based production scheduling , 2005, Eng. Appl. Artif. Intell..

[25]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[26]  András Lörincz Hippocampal Formation Breaks Combinatorial Explosion for Reinforcement Learning: A Conjecture , 2008, AAAI Fall Symposium: Biologically Inspired Cognitive Architectures.

[27]  Zhaohan Sheng,et al.  Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system , 2009, Expert Syst. Appl..

[28]  H. JoséAntonioMartín,et al.  The kNN-TD Reinforcement Learning Algorithm , 2009, IWINAC.

[29]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[30]  Moshe Tennenholtz,et al.  Artificial Social Systems , 1992, Lecture Notes in Computer Science.

[31]  Philippe Preux,et al.  Propagation of Q-values in Tabular TD(lambda) , 2002, ECML.

[32]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[33]  Ying Wang,et al.  A machine-learning approach to multi-robot coordination , 2008, Eng. Appl. Artif. Intell..

[34]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.