暂无分享,去创建一个
Michael H. Bowling | Rudolf Kadlec | Martin Schmid | Matej Moravcik | Neil Burch | Marc Lanctot | Marc Lanctot | Neil Burch | Matej Moravcík | Martin Schmid | Rudolf Kadlec
[1] Sepp Hochreiter,et al. RUDDER: Return Decomposition for Delayed Rewards , 2018, NeurIPS.
[2] Duane Szafron,et al. Generalized Sampling and Variance in Counterfactual Regret Minimization , 2012, AAAI.
[3] Michael H. Bowling,et al. Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization , 2012, AAMAS.
[4] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[5] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.
[6] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[7] Michael H. Bowling,et al. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.
[8] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.
[9] Michael H. Bowling,et al. AIVAT: A New Variance Reduction Technique for Agent Evaluation in Imperfect Information Games , 2016, AAAI.
[10] Barteld Kooi,et al. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI-11) , 2011, AAAI 2011.
[11] Kevin Waugh,et al. Accelerating Best Response Calculation in Large Extensive Games , 2011, IJCAI.
[12] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.
[13] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[14] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.
[15] Hao Liu,et al. Action-dependent Control Variates for Policy Optimization via Stein Identity , 2018, ICLR.
[16] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.
[17] Joel Veness,et al. Variance Reduction in Monte-Carlo Tree Search , 2011, NIPS.
[18] Kevin Waugh,et al. Solving Games with Functional Regret Estimation , 2014, AAAI Workshop: Computer Poker and Imperfect Information.
[19] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[20] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.
[21] Neil Burch,et al. Time and Space: Why Imperfect Information Games are Hard , 2018 .
[22] David Silver,et al. Fictitious Self-Play in Extensive-Form Games , 2015, ICML.
[23] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.
[24] Kevin Waugh,et al. Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.
[25] Sergey Levine,et al. The Mirage of Action-Dependent Baselines in Reinforcement Learning , 2018, ICML.
[26] Yoav Shoham,et al. Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .
[27] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .
[28] Alexandre M. Bayen,et al. Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines , 2018, ICLR.
[29] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[30] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.
[31] Mark H. M. Winands,et al. Quality-based Rewards for Monte-Carlo Tree Search Simulations , 2014, ECAI.
[32] Michael H. Bowling,et al. Monte carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games , 2013 .
[33] Michael H. Bowling,et al. Solving Imperfect Information Games Using Decomposition , 2013, AAAI.
[34] Michael H. Bowling,et al. Solving Heads-Up Limit Texas Hold'em , 2015, IJCAI.
[35] P. Boyle. Options: A Monte Carlo approach , 1977 .