Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information
暂无分享,去创建一个
Jun Zhu | Jialian Li | Yichi Zhou | Jun Zhu | Yichi Zhou | J. Li
[1] O. H. Brownlee,et al. ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .
[2] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[3] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .
[4] Michael H. Bowling,et al. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.
[5] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[7] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[8] Bart De Schutter,et al. Multi-agent Reinforcement Learning: An Overview , 2010 .
[9] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.
[10] Bruno Scherrer,et al. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.
[11] Csaba Szepesv Ari,et al. Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms , 1996 .
[12] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[13] Lakhmi C. Jain,et al. Innovations in Multi-Agent Systems and Applications - 1 , 2010 .
[14] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[15] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.
[16] Shipra Agrawal,et al. Near-Optimal Regret Bounds for Thompson Sampling , 2017, J. ACM.
[17] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[18] Kevin Waugh,et al. Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.
[19] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[20] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[21] Neil Burch,et al. Time and Space: Why Imperfect Information Games are Hard , 2018 .
[22] David Silver,et al. Fictitious Self-Play in Extensive-Form Games , 2015, ICML.
[23] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[24] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[25] A. Pakes,et al. Markov-Perfect Industry Dynamics: A Framework for Empirical Work , 1995 .
[26] Chen-Yu Wei,et al. Online Reinforcement Learning in Stochastic Games , 2017, NIPS.
[27] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[28] Benjamin Van Roy,et al. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? , 2016, ICML.
[29] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..