论文信息 - Multiagent Reinforcement Learning with Spiking and Non-Spiking Agents in the Iterated Prisoner's Dilemma

Multiagent Reinforcement Learning with Spiking and Non-Spiking Agents in the Iterated Prisoner's Dilemma

This paper investigates Multiagent Reinforcement Learning (MARL) in a general-sum game where the payoffs' structure is such that the agents are required to exploit each other in a way that benefits all agents. The contradictory nature of these games makes their study in multiagent systems quite challenging. In particular, we investigate MARL with spiking and non-spiking agents in the Iterated Prisoner's Dilemma by exploring the conditions required to enhance its cooperative outcome. According to the results, this is enhanced by: (i) a mixture of positive and negative payoff values and a high discount factor in the case of non-spiking agents and (ii) having longer eligibility trace time constant in the case of spiking agents. Moreover, it is shown that spiking and non-spiking agents have similar behaviour and therefore they can equally well be used in any multiagent interaction setting. For training the spiking agents, a novel and necessary modification enhances competition to an existing learning rule based on stochastic synaptic transmission.

[1] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[2] Xiaohui Xie,et al. Learning in neural networks by reinforcement of irregular spiking. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[4] Geoffrey J. Gordon. Agendas for multi-agent learning , 2007, Artif. Intell..

[5] Markus Diesmann,et al. A Spiking Neural Network Model of an Actor-Critic Learning Agent , 2009, Neural Computation.

[6] Razvan V. Florian,et al. Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity , 2007, Neural Computation.

[7] Peter Stone,et al. Multiagent learning is not the answer. It is the question , 2007, Artif. Intell..

[8] H. Seung,et al. Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[9] Daniel Kudenko,et al. Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[10] Yoav Shoham,et al. If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[11] Manuela M. Veloso,et al. Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[12] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[13] Alvin E. Roth,et al. Multi-agent learning and the descriptive value of simple models , 2007, Artif. Intell..

[14] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[15] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[16] Robert A. Legenstein,et al. A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback , 2008, PLoS Comput. Biol..

[17] Bikramjit Banerjee,et al. Convergent Gradient Ascent in General-Sum Games , 2002, ECML.

[18] Simon Parsons,et al. What evolutionary game theory tells us about multiagent learning , 2007, Artif. Intell..

[19] L. Abbott,et al. Synaptic plasticity: taming the beast , 2000, Nature Neuroscience.

[20] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[21] David Kraines,et al. The Threshold of Cooperation Among Adaptive Agents: Pavlov and the Stag Hunt , 1996, ATAL.

[22] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[23] Robert H. Crites,et al. Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[24] Drew Fudenberg,et al. An economist's perspective on multi-agent learning , 2007, Artif. Intell..

[25] A. Rapoport,et al. Prisoner's Dilemma: A Study in Conflict and Co-operation , 1970 .

[26] Michael L. Littman,et al. A hierarchy of prescriptive goals for multiagent learning , 2007, Artif. Intell..

[27] Yoav Shoham,et al. A general criterion and an algorithmic framework for learning in multi-agent systems , 2007, Machine Learning.

[28] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[29] Marco Wiering,et al. Convergence and Divergence in Standard and Averaging Reinforcement Learning , 2004, ECML.

[30] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.