Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks

Nash Q-learning and team Q-learning are extended versions of reinforcement learning method for using in Multi-agent systems as cooperation mechanisms. The complexity of multi-agent reinforcement learning systems is extremely high thus it is necessary to use complexity reduction methods like hierarchical structures, abstraction and task decomposition. A typical approach for the latter to define subtasks is based on extracting bottlenecks. In this paper, bottlenecks are automatically extracted to create temporally extended actions which are in turn added to available agent's actions in cooperation mechanisms of multi- agent systems. The updating equations of team Q-learning and Nash Q-learning are extended in such a way to involve temporally extended actions. In this way the performance of learning in team Q-learning and Nash Q-learning is considerably increased. The experimental results show an interesting improvement in the process of learning of cooperation mechanisms being augmented by extracted temporally actions in multi-agent problems.

[1]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[2]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[3]  Yoav Shoham,et al.  Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[4]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[5]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[6]  Alicia P. Wolfe,et al.  Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[7]  Peter Stone,et al.  Multiagent learning is not the answer. It is the question , 2007, Artif. Intell..

[8]  Manuel Graña,et al.  Cooperative Multi-Agent Reinforcement Learning for Multi-Component Robotic Systems: guidelines for future research , 2011, Paladyn J. Behav. Robotics.

[9]  Andrew G. Barto,et al.  Skill Characterization Based on Betweenness , 2008, NIPS.

[10]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[11]  Michael H. Bowling,et al.  Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[12]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[13]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[14]  Peter Stone,et al.  Layered Learning in Multiagent Systems , 1997, AAAI/IAAI.

[15]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[16]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[17]  Bart De Schutter,et al.  Multiagent Reinforcement Learning with Adaptive State Focus , 2005, BNAIC.

[18]  Sridhar Mahadevan,et al.  Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.

[19]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[20]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[21]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  Reinaldo A. C. Bianchi,et al.  Multi-agent Multi-objective Learning Using Heuristically Accelerated Reinforcement Learning , 2012, 2012 Brazilian Robotics Symposium and Latin American Robotics Symposium.

[24]  Victor Lesser,et al.  Scaling multi-agent learning in complex environments , 2011 .

[25]  Shie Mannor,et al.  Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[26]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[27]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[28]  Erfu Yang,et al.  Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey , 2004 .

[29]  Csaba Szepesvári,et al.  A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[30]  Erfu Yang,et al.  A Survey on Multiagent Reinforcement Learning Towards Multi-Robot Systems , 2005, CIG.

[31]  Von-Wun Soo,et al.  AUTOMATIC COMPLEXITY REDUCTION IN REINFORCEMENT LEARNING , 2010, Comput. Intell..

[32]  Sridhar Mahadevan,et al.  A multiagent reinforcement learning algorithm by dynamically merging markov decision processes , 2002, AAMAS '02.

[33]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[34]  Sridhar Mahadevan,et al.  Learning to Take Concurrent Actions , 2002, NIPS.

[35]  Aram Galstyan,et al.  Continuous strategy replicator dynamics for multi-agent Q-learning , 2009, Autonomous Agents and Multi-Agent Systems.

[36]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[37]  Yong Duan,et al.  A multi-agent reinforcement learning approach to robot soccer , 2012, Artificial Intelligence Review.

[38]  Faruk Polat,et al.  A layered approach to learning coordination knowledge in multiagent environments , 2007, Applied Intelligence.

[39]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[40]  M. Ghavamzadeh,et al.  Hierarchical reinforcement learning in continuous state and multi-agent environments , 2005 .

[41]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.