Multiagent Reinforcement Learning with Spiking and Non-Spiking Agents in the Iterated Prisoner's Dilemma

This paper investigates Multiagent Reinforcement Learning (MARL) in a general-sum game where the payoffs' structure is such that the agents are required to exploit each other in a way that benefits all agents. The contradictory nature of these games makes their study in multiagent systems quite challenging. In particular, we investigate MARL with spiking and non-spiking agents in the Iterated Prisoner's Dilemma by exploring the conditions required to enhance its cooperative outcome. According to the results, this is enhanced by: (i) a mixture of positive and negative payoff values and a high discount factor in the case of non-spiking agents and (ii) having longer eligibility trace time constant in the case of spiking agents. Moreover, it is shown that spiking and non-spiking agents have similar behaviour and therefore they can equally well be used in any multiagent interaction setting. For training the spiking agents, a novel and necessary modification enhances competition to an existing learning rule based on stochastic synaptic transmission.

[1]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[2]  Xiaohui Xie,et al.  Learning in neural networks by reinforcement of irregular spiking. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[4]  Geoffrey J. Gordon Agendas for multi-agent learning , 2007, Artif. Intell..

[5]  Markus Diesmann,et al.  A Spiking Neural Network Model of an Actor-Critic Learning Agent , 2009, Neural Computation.

[6]  Razvan V. Florian,et al.  Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity , 2007, Neural Computation.

[7]  Peter Stone,et al.  Multiagent learning is not the answer. It is the question , 2007, Artif. Intell..

[8]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[9]  Daniel Kudenko,et al.  Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[10]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[11]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[12]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[13]  Alvin E. Roth,et al.  Multi-agent learning and the descriptive value of simple models , 2007, Artif. Intell..

[14]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[15]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Robert A. Legenstein,et al.  A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback , 2008, PLoS Comput. Biol..

[17]  Bikramjit Banerjee,et al.  Convergent Gradient Ascent in General-Sum Games , 2002, ECML.

[18]  Simon Parsons,et al.  What evolutionary game theory tells us about multiagent learning , 2007, Artif. Intell..

[19]  L. Abbott,et al.  Synaptic plasticity: taming the beast , 2000, Nature Neuroscience.

[20]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[21]  David Kraines,et al.  The Threshold of Cooperation Among Adaptive Agents: Pavlov and the Stag Hunt , 1996, ATAL.

[22]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[23]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[24]  Drew Fudenberg,et al.  An economist's perspective on multi-agent learning , 2007, Artif. Intell..

[25]  A. Rapoport,et al.  Prisoner's Dilemma: A Study in Conflict and Co-operation , 1970 .

[26]  Michael L. Littman,et al.  A hierarchy of prescriptive goals for multiagent learning , 2007, Artif. Intell..

[27]  Yoav Shoham,et al.  A general criterion and an algorithmic framework for learning in multi-agent systems , 2007, Machine Learning.

[28]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[29]  Marco Wiering,et al.  Convergence and Divergence in Standard and Averaging Reinforcement Learning , 2004, ECML.

[30]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.