Deep Reinforcement Learning Variants of Multi-Agent Learning Algorithms

We introduce Deep Repeated Update Q-Network (DRUQN) and Deep Loosely Coupled Q-Network (DLCQN). Two novel variants of Deep Q-Network (DQN). These algorithms are designed with the intention of providing architectures that are more appropriate for handling interactions between multiple agents and robust enough to deal with the non-stationarity produced by concurrent learning. We approach this from two different fronts. DRUQN tries to address Q-Learning’s tendency to favor the update of certain action-values which may lead to decreased performance in rapid changing environments. Meanwhile, DLCQN learns to decompose the state space into two: (1) states where it is sensible or necessary to act independently and (2) those where acting in coordination with another agent may lead to a better outcome. We use Pong as testing environment and compare the performance of DRUQN, DLCQN and DQN on different competitive and cooperative experiments. The results demonstrate that for some tasks DLCQN and DRUQN outperform DQN which hints at the necessity to develop and using architectures capable of coping with richer and more complex dynamics.

[1]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[2]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[3]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[4]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[5]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Larry D. Pyeatt,et al.  Decision Tree Function Approximation in Reinforcement Learning , 1999 .

[8]  Martin A. Riedmiller Concepts and Facilities of a Neural Reinforcement Learning Control Architecture for Technical Process Control , 1999, Neural Computing & Applications.

[9]  Sebastian Thrun,et al.  Coastal Navigation with Mobile Robots , 1999, NIPS.

[10]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[11]  Milind Tambe,et al.  Multiagent teamwork: analyzing the optimality and complexity of key theories and models , 2002, AAMAS '02.

[12]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[13]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[14]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[15]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[16]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[17]  Nikos A. Vlassis,et al.  Utile Coordination: Learning Interdependencies Among Cooperative Agents , 2005, CIG.

[18]  Kagan Tumer,et al.  Quicker Q-Learning in Multi-Agent Systems , 2005 .

[19]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[22]  Daniel Kudenko,et al.  Reinforcement Learning of Coordination in Heterogeneous Cooperative Multi-agent Systems , 2005, Adaptive Agents and Multi-Agent Systems.

[23]  Sridhar Mahadevan,et al.  Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.

[24]  Manuela M. Veloso,et al.  Exploiting factored representations for decentralized execution in multiagent teams , 2007, AAMAS '07.

[25]  Mirco Hering Cooperative Multi-Agent Systems in Automobiles , 2008 .

[26]  Peter Vrancx,et al.  Learning multi-agent state space representations , 2010, AAMAS.

[27]  Karl Tuyls,et al.  Frequency adjusted multi-agent Q-learning , 2010, AAMAS.

[28]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[29]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[30]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[31]  Gerhard Weiss,et al.  Multiagent Learning: Basics, Challenges, and Prospects , 2012, AI Mag..

[32]  Warren B. Powell,et al.  An Intelligent Battery Controller Using Bias-Corrected Q-learning , 2012, AAAI.

[33]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[34]  Alborz Geramifard,et al.  A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning , 2013, Found. Trends Mach. Learn..

[35]  Sherief Abdallah,et al.  Addressing the policy-bias of q-learning by repeating updates , 2013, AAMAS.

[36]  ImageNet Classification with Deep Convolutional Neural , 2013 .

[37]  Minjie Zhang,et al.  Multiagent Learning of Coordination in Loosely Coupled , 2014 .

[38]  Jason Weston,et al.  Question Answering with Subgraph Embeddings , 2014, EMNLP.

[39]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[40]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[41]  Risto Miikkulainen,et al.  A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[42]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[43]  Mikhail Pavlov,et al.  Deep Attention Recurrent Q-Network , 2015, ArXiv.

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[46]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[47]  Benjamin Van Roy,et al.  Generalization and Exploration via Randomized Value Functions , 2014, ICML.

[48]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[49]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[50]  Sherief Abdallah,et al.  Addressing Environment Non-Stationarity by Repeating Q-learning Updates , 2016, J. Mach. Learn. Res..

[51]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[52]  Nando de Freitas,et al.  Learning to Perform Physics Experiments via Deep Reinforcement Learning , 2016, ICLR.