Two-Stage Reinforcement Learning Algorithm for Quick Cooperation in Repeated Games

People often learn their behavior from its outcome, e.g., success and failure. Also, in the real world, people are not alone and many interactions occur among people every day. To model such learning and interactions, let us consider reinforcement learning agents playing games. Many researchers have studied a lot of reinforcement learning algorithms to obtain good strategies in games. However, most of the algorithms are “suspicious”, i.e., focusing on how to escape from being exploited by greedy opponents. Therefore, it takes long time to establish cooperation among such agents. On the other hand, if the agents are “innocent”, i.e., prone to trust others, they establish cooperation easily but are exploited by acquisitive opponents. In this work, we propose an algorithm that uses two complementary, “innocent” and “suspicious” algorithms in the early and the late stage, respectively. The algorithm allows the agent to cooperate with good associates quickly as well as treat greedy opponents well. The experiments in ten games showed us that the proposed algorithm successfully learned good strategies quickly in nine games.

[1]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[2]  Brahim Chaib-draa,et al.  Effective learning in the presence of adaptive counterparts , 2009, J. Algorithms.

[3]  Masayuki Numao,et al.  LEARNING BETTER STRATEGIES WITH A COMBINATION OF COMPLEMENTARY REINFORCEMENT LEARNING ALGORITHMS , 2016 .

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[6]  Marco Wiering,et al.  Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[8]  G. Baldassarre,et al.  Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[9]  Michael A. Goodrich,et al.  Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning , 2011, Machine Learning.

[10]  Michael A. Goodrich,et al.  Learning To Cooperate in a Social Dilemma: A Satisficing Approach to Bargaining , 2003, ICML.

[11]  Mitsuhiro Nakamura,et al.  Numerical analysis of a reinforcement learning model with the dynamic aspiration level in the iterated Prisoner's dilemma. , 2010, Journal of theoretical biology.