SCC-rFMQ Learning in Cooperative Markov Games with Continuous Actions

Although many reinforcement learning methods have been proposed for learning the optimal solutions in single-agent continuous action domains, multiagent coordination domains with continuous action have received relatively few investigations. In this paper, we propose an independent learner hierarchical method, named Sample Continuous Coordination with recursive Frequency Maximum Q-Value (SCC-rFMQ), which divides the coordination problem into two layers. The first layer samples a finite set of actions from the continuous action spaces by a sampling mechanism with variable exploratory rates, and the second layer evaluates the actions in the sampled action set and updates the policy using a multiagent reinforcement learning coordination method. By constructing coordination mechanisms at both levels, SCC-rFMQ can handle coordination problems in continuous action cooperative Markov games effectively. Experimental results show that SCC-rFMQ outperforms other reinforcement learning algorithms.

[1]  Rahul Savani,et al.  Lenient Multi-Agent Deep Reinforcement Learning , 2017, AAMAS.

[2]  Karl Tuyls,et al.  Empirical and theoretical support for lenient learning , 2011, AAMAS.

[3]  Shalabh Bhatnagar,et al.  Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[4]  Santanu Saha Ray,et al.  Numerical Analysis with Algorithms and Programming , 2016 .

[5]  Hado van Hasselt,et al.  Reinforcement Learning in Continuous State and Action Spaces , 2012, Reinforcement Learning.

[6]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[7]  Geoffrey E. Hinton,et al.  Reinforcement Learning with Factored States and Actions , 2004, J. Mach. Learn. Res..

[8]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[9]  Karl Tuyls,et al.  Artificial agents learning human fairness , 2008, AAMAS.

[10]  Cuihua Shen,et al.  Channels matter: Multimodal connectedness, types of co-players and social capital for Multiplayer Online Battle Arena gamers , 2015, Comput. Hum. Behav..

[11]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[12]  LukeSean,et al.  Lenient learning in independent-learner stochastic cooperative games , 2016 .

[13]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[14]  Aram Galstyan,et al.  Continuous strategy replicator dynamics for multi-agent Q-learning , 2009, Autonomous Agents and Multi-Agent Systems.

[15]  M. Thathachar,et al.  Networks of Learning Automata: Techniques for Online Stochastic Optimization , 2003 .

[16]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[17]  Sean Luke,et al.  Lenient learners in cooperative multiagent systems , 2006, AAMAS '06.

[18]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[19]  Guillaume J. Laurent,et al.  Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[21]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[22]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[23]  Michail G. Lagoudakis,et al.  Binary action search for learning continuous-action control policies , 2009, ICML '09.

[24]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[25]  Jason Pazis,et al.  Generalized Value Functions for Large Action Sets , 2011, ICML.

[26]  Andrea Bonarini,et al.  Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods , 2007, NIPS.

[27]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[28]  Lionel Jouffe,et al.  Fuzzy inference system learning by reinforcement methods , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[29]  Sean Luke,et al.  Lenient Learning in Independent-Learner Stochastic Cooperative Games , 2016, J. Mach. Learn. Res..

[30]  Jason Pazis,et al.  Reinforcement learning in multidimensional continuous action spaces , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[31]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.