Multiagent cooperation and competition with deep reinforcement learning

Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

[1]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[2]  R. Kirk CONVENTION: A PHILOSOPHICAL STUDY , 1970 .

[3]  R. Lathe Phd by thesis , 1988, Nature.

[4]  C. Watkins Learning from delayed rewards , 1989 .

[5]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[6]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[7]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[8]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[9]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[10]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[11]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  K. Binmore Do Conventions Need to Be Common Knowledge? , 2008 .

[15]  Alan K. Mackworth,et al.  Artificial Intelligence - Foundations of Computational Agents , 2010 .

[16]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[17]  L Poole David,et al.  Artificial Intelligence: Foundations of Computational Agents , 2010 .

[18]  D. Sumpter Collective Animal Behavior , 2010 .

[19]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20]  Chengyi Xia,et al.  Finite-time stability of multi-agent system in disturbed environment , 2012 .

[21]  Armin W. Schulz Signals: evolution, learning, and information , 2012 .

[22]  Marco Wiering,et al.  Reinforcement learning in the game of Othello: Learning against a fixed opponent and learning from self-play , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[23]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[24]  Jürgen Schmidhuber,et al.  Evolving large-scale neural networks for vision-based reinforcement learning , 2013, GECCO '13.

[25]  Howard M. Schwartz,et al.  Multi-Agent Machine Learning: A Reinforcement Approach , 2014 .

[26]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[27]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[28]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[29]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[30]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[31]  Jürgen Schmidhuber,et al.  On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models , 2015, ArXiv.

[32]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[33]  Shimon Whiteson,et al.  Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks , 2016, ArXiv.

[34]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[35]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[36]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.