论文信息 - Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration

Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration

Q-learning in single-agent environments is known to converge in the limit given sufficient exploration. The same algorithm has been applied, with some success, in multi-agent environments, where traditional analysis techniques break down. Using established dynamical systems methods, we derive and study an idealization of Q-learning in 2-player 2-action repeated general-sum games. In particular, we address the discontinuous case of e-greedy exploration and use it as a proxy for value-based algorithms to highlight a contrast with existing results in policy search. Analogously to previous results for gradient ascent algorithms, we provide a complete catalog of the convergence behavior of the e-greedy Q-learning algorithm by introducing new subclasses of these games. We identify two subclasses of Prisoner's Dilemma-like games where the application of Q-learning with e-greedy exploration results in higher-than-Nash average payoffs for some initial conditions.

[1] Robert H. Crites,et al. Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[2] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[3] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[4] Yishay Mansour,et al. Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[5] Peter Stone,et al. Implicit Negotiation in Repeated Games , 2001, ATAL.

[6] Manuela M. Veloso,et al. Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[7] Tom Lenaerts,et al. A selection-mutation model for q-learning in multi-agent systems , 2003, AAMAS '03.

[8] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10] C. Budd,et al. Review of ”Piecewise-Smooth Dynamical Systems: Theory and Applications by M. di Bernardo, C. Budd, A. Champneys and P. 2008” , 2020 .

[11] P. Dayan,et al. Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[12] Ryszard Kowalczyk,et al. Dynamic analysis of multiagent Q-learning with ε-greedy exploration , 2009, ICML '09.