αα-Rank: Practically Scaling α-Rank through Stochastic Optimisation

Recently, α-Rank, a graph-based algorithm, has been proposed as a solution to ranking joint policy pro les in large scale multi-agent systems. α-Rank claimed tractability through a polynomial time implementation with respect to the total number of pure strategy pro les. Here, we note that inputs to the algorithm were not clearly speci ed in the original presentation; as such, we deem complexity claims as not grounded, and conjecture solving α-Rank is NP-hard. The authors of α-Rank suggested that the input to α-Rank can be an exponentially-sized payo matrix; a claim promised to be claried later. Even though α-Rank exhibits a polynomial-time solution with respect to such an input, we further re ect additional critical problems. We demonstrate that due to the need of constructing an exponentially large Markov chain, α-Rank is infeasible beyond a small nite number of agents. We ground these claims by adopting amount of dollars spent as a non-refutable evaluation metric. Realising such scalability issue, we present a stochastic implementation of α-Rank with a double oracle mechanism allowing for reductions in joint strategy spaces. Our method, αα -Rank, does not need to save exponentially-large transition matrix, and can terminate early under required precision. Although theoretically our method exhibits similar worst-case complexity guarantees compared to α-Rank, it allows us, for the rst time, to practically conduct large-scale multiagent evaluations. On 104×104 random matrices, we achieve 1000x speed reduction. Furthermore, we also show successful results on large joint strategy pro les with a maximum size in the order of O(225) (≈ 33million joint strategies) – a setting not evaluable using α-Rank with reasonable computational budget.

[1]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[2]  Georgios Piliouras,et al.  Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos , 2017, NIPS.

[3]  Volkan Cevher,et al.  Practical Sketching Algorithms for Low-Rank Matrix Approximation , 2016, SIAM J. Matrix Anal. Appl..

[4]  David Silver,et al.  Fictitious Self-Play in Extensive-Form Games , 2015, ICML.

[5]  Yannick Viossat,et al.  The replicator dynamics does not lead to correlated equilibria , 2007, Games Econ. Behav..

[6]  Jun Wang,et al.  Multi-Agent Reinforcement Learning , 2020, Deep Reinforcement Learning.

[7]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[9]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[10]  G. Golub,et al.  Eigenvalue computation in the 20th century , 2000 .

[11]  Tengyu Ma,et al.  Online Learning of Eigenvectors , 2015, ICML.

[12]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[13]  Drew Fudenberg,et al.  Imitation Processes with Small Mutations , 2004, J. Econ. Theory.

[14]  Christos H. Papadimitriou,et al.  α-Rank: Multi-Agent Evaluation by Evolution , 2019, Scientific Reports.

[15]  Haitham Bou-Ammar,et al.  Balancing Two-Player Stochastic Games with Soft Q-Learning , 2018, IJCAI.

[16]  Guy Lever,et al.  Emergent Coordination Through Competition , 2019, ICLR.

[17]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[18]  Kenneth Dixon,et al.  Introduction to Stochastic Modeling , 2011 .

[19]  Arkadi Nemirovski,et al.  The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography , 2001, SIAM J. Optim..

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Weinan Zhang,et al.  Bi-level Actor-Critic for Multi-agent Coordination , 2020, AAAI.

[22]  Guy Lever,et al.  A Generalized Training Approach for Multiagent Learning , 2020, ICLR.

[23]  Sriram Srinivasan,et al.  OpenSpiel: A Framework for Reinforcement Learning in Games , 2019, ArXiv.

[24]  Avrim Blum,et al.  Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[25]  J. Hofbauer,et al.  Uncoupled Dynamics Do Not Lead to Nash Equilibrium , 2003 .

[26]  Haitham Bou-Ammar,et al.  An Information-Theoretic Optimality Principle for Deep Reinforcement Learning , 2017, ArXiv.

[27]  Xiaotie Deng,et al.  Settling the Complexity of Two-Player Nash Equilibrium , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[28]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[29]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[30]  Lantao Yu,et al.  A Study of AI Population Dynamics with Million-agent Reinforcement Learning , 2017, AAMAS.

[31]  Michal Valko,et al.  Multiagent Evaluation under Incomplete Information , 2019, NeurIPS.

[32]  E. Ising Beitrag zur Theorie des Ferromagnetismus , 1925 .

[33]  Manuela M. Veloso,et al.  Convergence of Gradient Dynamics with a Variable Learning Rate , 2001, ICML.