论文信息 - Adaptive Two-stage Learning Algorithm for Repeated Games

Adaptive Two-stage Learning Algorithm for Repeated Games

In our society, people engage in a variety of interactions. To analyze such interactions, we consider these interactions as a game and people as agents equipped with reinforcement learning algorithms. Reinforcement learning algorithms are widely studied with a goal of identifying strategies of gaining large payoffs in games; however, existing algorithms learn slowly because they require a large number of interactions. In this work, we constructed an algorithm that both learns quickly and maximizes payoffs in various repeated games. Our proposed algorithm combines two different algorithms that are used in the early and later stages of our algorithm. We conducted experiments in which our proposed agents played ten kinds of games in self-play and with other agents. Results showed that our proposed algorithm learned more quickly than existing algorithms and gained sufficiently large payoffs in nine games.

Masayuki Numao | Ken-ichi Fukui | Koichi Moriyama | Wataru Fujita

[1] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[2] Masayuki Numao,et al. LEARNING BETTER STRATEGIES WITH A COMBINATION OF COMPLEMENTARY REINFORCEMENT LEARNING ALGORITHMS , 2016 .

[3] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[4] Mitsuhiro Nakamura,et al. Numerical analysis of a reinforcement learning model with the dynamic aspiration level in the iterated Prisoner's dilemma. , 2010, Journal of theoretical biology.

[5] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[6] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[7] Brahim Chaib-draa,et al. Effective learning in the presence of adaptive counterparts , 2009, J. Algorithms.

[8] Michael A. Goodrich,et al. Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning , 2011, Machine Learning.

[9] Marco Wiering,et al. Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10] Michael A. Goodrich,et al. Learning To Cooperate in a Social Dilemma: A Satisficing Approach to Bargaining , 2003, ICML.