Exploring selfish reinforcement learning in repeated games with stochastic rewards

In this paper we introduce a new multi-agent reinforcement learning algorithm, called exploring selfish reinforcement learning (ESRL). ESRL allows agents to reach optimal solutions in repeated non-zero sum games with stochastic rewards, by using coordinated exploration. First, two ESRL algorithms for respectively common interest and conflicting interest games are presented. Both ESRL algorithms are based on the same idea, i.e. an agent explores by temporarily excluding some of the local actions from its private action space, to give the team of agents the opportunity to look for better solutions in a reduced joint action space. In a latter stage these two algorithms are transformed into one generic algorithm which does not assume that the type of the game is known in advance. ESRL is able to find the Pareto optimal solution in common interest games without communication. In conflicting interest games ESRL only needs limited communication to learn a fair periodical policy, resulting in a good overall policy. Important to know is that ESRL agents are independent in the sense that they only use their own action choices and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can handle both stochastic, possible delayed rewards and asynchronous action selection. A real-life experiment, i.e. adaptive load-balancing of parallel applications is added.

[1]  Khadija Iqbal,et al.  An introduction , 1996, Neurobiology of Aging.

[2]  Ann Nowé,et al.  Homo Egualis Reinforcement Learning Agents for Load Balancing , 2002, WRAC.

[3]  Ann Nowé,et al.  Coordinated exploration in multi-agent reinforcement learning: an application to load-balancing , 2005, AAMAS '05.

[4]  V. V. Phansalkar,et al.  Decentralized Learning of Nash Equilibria in Multi-Person Stochastic Games With Incomplete Information , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[5]  P. S. Sastry,et al.  Varieties of learning automata: an overview , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[6]  Daniel Kudenko,et al.  Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[7]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[8]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[9]  Daniel Kudenko,et al.  Reinforcement Learning Approaches to Coordination in Cooperative Multi-agent Systems , 2002, Adaptive Agents and Multi-Agents Systems.

[10]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[11]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[12]  Herbert Gintis,et al.  Game Theory Evolving: A Problem-Centered Introduction to Modeling Strategic Interaction - Second Edition , 2009 .

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  L. Samuelson Evolutionary Games and Equilibrium Selection , 1997 .

[15]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[16]  Ronen I. Brafman,et al.  Learning to Coordinate Efficiently: A Model-based Approach , 2003, J. Artif. Intell. Res..

[17]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[18]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[19]  Daniel Kudenko,et al.  Baselines for Joint-Action Reinforcement Learning of Coordination in Cooperative Multi-agent Systems , 2004, Adaptive Agents and Multi-Agent Systems.

[20]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..

[21]  Ann Nowé,et al.  Social Agents Playing a Periodical Policy , 2001, ECML.

[22]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[24]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[25]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[26]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[27]  Daniel Kudenko,et al.  Learning to Coordinate Using Commitment Sequences in Cooperative Multi-agent Systems , 2005, Adaptive Agents and Multi-Agent Systems.

[28]  Harold Houba Game Theory Evolving: a Problem-centered Introduction to Modeling Stratgeic Behavior [Review of: H. Gintis (2000) Game Theory Evolving: a Problem-centered Introduction to Modeling Strategic Behavior] , 2001 .