Convergence of Multi-Agent Learning with a Finite Step Size in General-Sum Games

Learning in a multi-agent system is challenging because agents are simultaneously learning and the environment is not stationary, undermining convergence guarantees. To address this challenge, this paper presents a new gradient-based learning algorithm, called Gradient Ascent with Shrinking Policy Prediction (GA-SPP), which augments the basic gradient ascent approach with the concept of shrinking policy prediction. The key idea behind this algorithm is that an agent adjusts its strategy in response to the forecasted strategy of the other agent, instead of its current one. GA-SPP is shown formally to have Nash convergence in larger settings than existing gradient-based multi-agent learning methods. Furthermore, unlike existing gradient-based methods, GA-SPP's theoretical guarantees do not assume the learning rate to be infinitesimal.

[1]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[2]  Victor R. Lesser,et al.  A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics , 2008, J. Artif. Intell. Res..

[3]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[4]  Yoav Shoham,et al.  Simple search methods for finding a Nash equilibrium , 2004, Games Econ. Behav..

[5]  Victor R. Lesser,et al.  Multi-Agent Learning with Policy Prediction , 2010, AAAI.

[6]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[7]  C. E. Lemke,et al.  Equilibrium Points of Bimatrix Games , 1964 .

[8]  Jacob W. Crandall,et al.  Towards Minimizing Disappointment in Repeated Games , 2014, J. Artif. Intell. Res..

[9]  Branislav Bosanský,et al.  Algorithms for computing strategies in two-player simultaneous move games , 2016, Artif. Intell..

[10]  R. Enkhbat,et al.  Extragradient approach to solution of two person non-zero sum games , 2003 .

[11]  김여근 쌍행렬게임의 평형점 ( Equilibrium Points of Bimatrix Games : A State-of-the-Art ) , 1982 .

[12]  Johanne Cohen,et al.  Learning with Bandit Feedback in Potential Games , 2017, NIPS.

[13]  Michael A. Goodrich,et al.  Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning , 2011, Machine Learning.

[14]  Constantinos Daskalakis,et al.  Near-optimal no-regret algorithms for zero-sum games , 2011, SODA '11.

[15]  Anatoly Antipin The convergence of proximal methods to fixed points of extremal mappings and estimates of their rate of convergence , 1995 .

[16]  Alexandre M. Bayen,et al.  Online Learning of Nash Equilibria in Congestion Games , 2015, SIAM J. Control. Optim..

[17]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[18]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[19]  Maria L. Gini,et al.  Safely Using Predictions in General-Sum Normal Form Games , 2017, AAMAS.

[20]  Bikramjit Banerjee,et al.  Generalized multiagent learning with performance bound , 2007, Autonomous Agents and Multi-Agent Systems.

[21]  Nicholas R. Jennings,et al.  Iterative voting and acyclic games , 2017, Artif. Intell..

[22]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[23]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[24]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[25]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[26]  Manuela M. Veloso,et al.  Convergence of Gradient Dynamics with a Variable Learning Rate , 2001, ICML.

[27]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.