论文信息 - Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information - 字舞流文

Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information

Posterior sampling for reinforcement learning (PSRL) is a useful framework for making decisions in an unknown environment. PSRL maintains a posterior distribution of the environment and then makes planning on the environment sampled from the posterior distribution. Though PSRL works well on single-agent reinforcement learning problems, how to apply PSRL to multi-agent reinforcement learning problems is relatively unexplored. In this work, we extend PSRL to two-player zero-sum extensive-games with imperfect information (TZIEG), which is a class of multi-agent systems. More specifically, we combine PSRL with counterfactual regret minimization (CFR), which is the leading algorithm for TZIEG with a known environment. Our main contribution is a novel design of interaction strategies. With our interaction strategies, our algorithm provably converges to the Nash Equilibrium at a rate of $O(\sqrt{\log T/T})$. Empirical results show that our algorithm works well.

Jun Zhu | Jialian Li | Yichi Zhou | Jun Zhu | Yichi Zhou | J. Li

[1] O. H. Brownlee,et al. ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[2] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[3] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .

[4] Michael H. Bowling,et al. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.

[5] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .

[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[8] Bart De Schutter,et al. Multi-agent Reinforcement Learning: An Overview , 2010 .

[9] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[10] Bruno Scherrer,et al. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.

[11] Csaba Szepesv Ari,et al. Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms , 1996 .

[12] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[13] Lakhmi C. Jain,et al. Innovations in Multi-Agent Systems and Applications - 1 , 2010 .

[14] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.

[15] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[16] Shipra Agrawal,et al. Near-Optimal Regret Bounds for Thompson Sampling , 2017, J. ACM.

[17] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..

[18] Kevin Waugh,et al. Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[19] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[20] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[21] Neil Burch,et al. Time and Space: Why Imperfect Information Games are Hard , 2018 .

[22] David Silver,et al. Fictitious Self-Play in Extensive-Form Games , 2015, ICML.

[23] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[24] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[25] A. Pakes,et al. Markov-Perfect Industry Dynamics: A Framework for Empirical Work , 1995 .

[26] Chen-Yu Wei,et al. Online Reinforcement Learning in Stochastic Games , 2017, NIPS.

[27] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.

[28] Benjamin Van Roy,et al. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? , 2016, ICML.

[29] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..