暂无分享,去创建一个
Michael H. Bowling | Rémi Munos | Karl Tuyls | Sriram Srinivasan | Marc Lanctot | Julien Pérolat | Vinícius Flores Zambaldi | S. Srinivasan | R. Munos | V. Zambaldi | Marc Lanctot | K. Tuyls | J. Pérolat
[1] H. W. Kuhn,et al. 11. Extensive Games and the Problem of Information , 1953 .
[2] S. Vajda. Some topics in two-person games , 1971 .
[3] P. Taylor,et al. Evolutionarily Stable Strategies and Game Dynamics , 1978 .
[4] Jeffrey Scott Vitter,et al. Random sampling with a reservoir , 1985, TOMS.
[5] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[6] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[7] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .
[8] Josef Hofbauer,et al. Evolutionary Games and Population Dynamics , 1998 .
[9] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[10] Yishay Mansour,et al. Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.
[11] Richard S. Sutton,et al. Comparing Policy-Gradient Algorithms , 2001 .
[12] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..
[13] G. Tesauro,et al. Analyzing Complex Strategic Interactions in Multi-Agent Systems , 2002 .
[14] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[15] Rajarshi Das,et al. Choosing Samples to Compute Heuristic-Strategy Nash Equilibrium , 2003, AMEC.
[16] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[17] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[18] Michael H. Bowling,et al. Convergence and No-Regret in Multiagent Learning , 2004, NIPS.
[19] Michael L. Littman,et al. Cyclic Equilibria in Markov Games , 2005, NIPS.
[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[21] Paul W. Goldberg,et al. The complexity of computing a Nash equilibrium , 2006, STOC '06.
[22] Michael P. Wellman. Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.
[23] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.
[24] Victor R. Lesser,et al. A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics , 2008, J. Artif. Intell. Res..
[25] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[26] Yoav Shoham,et al. Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .
[27] Kevin Waugh,et al. Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.
[28] Josef Hofbauer,et al. Time Average Replicator and Best-Reply Dynamics , 2009, Math. Oper. Res..
[29] Duane Szafron,et al. Using counterfactual regret minimization to create competitive multiplayer poker agents , 2010, AAMAS.
[30] Michael L. Littman,et al. Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.
[31] Victor R. Lesser,et al. Multi-Agent Learning with Policy Prediction , 2010, AAAI.
[32] William H. Sandholm,et al. Population Games And Evolutionary Dynamics , 2010, Economic learning and social evolution.
[33] Peter Vrancx,et al. Game Theory and Multi-agent Reinforcement Learning , 2012, Reinforcement Learning.
[34] Guillaume J. Laurent,et al. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.
[35] Todd W. Neller,et al. An Introduction to Counterfactual Regret Minimization , 2013 .
[36] Richard Gibson,et al. Regret Minimization in Non-Zero-Sum Games with Applications to Building Champion Multiplayer Computer Poker Agents , 2013, ArXiv.
[37] Michael H. Bowling,et al. Monte carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games , 2013 .
[38] Marcello Restelli,et al. Efficient Evolutionary Dynamics with Extensive-Form Games , 2013, AAAI.
[39] Marcello Restelli,et al. Evolutionary Dynamics of Q-Learning over the Sequence Form , 2014, AAAI.
[40] Marc Lanctot,et al. Further developments of extensive-form replicator dynamics using the sequence-form representation , 2014, AAMAS.
[41] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.
[42] Kevin Waugh,et al. Solving Games with Functional Regret Estimation , 2014, AAAI Workshop: Computer Poker and Imperfect Information.
[43] Yishay Mansour,et al. Lower bounds on individual sequence regret , 2012, Machine Learning.
[44] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[45] David Silver,et al. Fictitious Self-Play in Extensive-Form Games , 2015, ICML.
[46] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[47] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[48] Michael H. Bowling,et al. Solving Heads-Up Limit Texas Hold'em , 2015, IJCAI.
[49] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[50] Karl Tuyls,et al. Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..
[51] Bruno Scherrer,et al. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.
[52] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[53] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[54] Bruno Scherrer,et al. On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games , 2016, AISTATS.
[55] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[56] Qian Yu,et al. Stochastic Evolution Dynamic of the Rock–Scissors–Paper Game Based on a Quasi Birth and Death Process , 2016, Scientific Reports.
[57] Amnon Shashua,et al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.
[58] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.
[59] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[60] Branislav Bosanský,et al. Algorithms for computing strategies in two-player simultaneous move games , 2016, Artif. Intell..
[61] Sergey Levine,et al. Deep Reinforcement Learning for Robotic Manipulation , 2016, ArXiv.
[62] David Silver,et al. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.
[63] Jordan L. Boyd-Graber,et al. Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.
[64] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[65] Pablo Hernandez-Leal,et al. A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.
[66] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[67] Alexander Peysakhovich,et al. Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.
[68] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[69] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.
[70] Tuomas Sandholm,et al. Dynamic Thresholding and Pruning for Regret Minimization , 2017, AAAI.
[71] Yuandong Tian,et al. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.
[72] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.
[73] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[74] Hoong Chuin Lau,et al. Policy Gradient With Value Function Approximation For Collective Multiagent Planning , 2018, NIPS.
[75] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[76] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[77] Alexander Peysakhovich,et al. Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.
[78] Sergey Levine,et al. DeepMimic , 2018, ACM Trans. Graph..
[79] Alexandre M. Bayen,et al. Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines , 2018, ICLR.
[80] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.
[81] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[82] Shimon Whiteson,et al. Expected Policy Gradients , 2017, AAAI.
[83] Sergey Levine,et al. Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[84] Kurt Keutzer,et al. Regret Minimization for Partially Observable Deep Reinforcement Learning , 2017, ICML.
[85] Zhang-Wei Hong,et al. A Deep Policy Inference Q-Network for Multi-Agent Systems , 2017, AAMAS.
[86] Joel Z. Leibo,et al. A Generalised Method for Empirical Game Theoretic Analysis , 2018, AAMAS.
[87] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[88] Pieter Abbeel,et al. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.
[89] Stephen Clark,et al. Emergent Communication through Negotiation , 2018, ICLR.
[90] Peter Stone,et al. Autonomous agents modelling other agents: A comprehensive survey and open problems , 2017, Artif. Intell..
[91] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[92] Hao Liu,et al. Action-dependent Control Variates for Policy Optimization via Stein Identity , 2018, ICLR.
[93] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.
[94] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.
[95] Olivier Pietquin,et al. Actor-Critic Fictitious Play in Simultaneous Move Multistage Games , 2018, AISTATS.
[96] Viliam Lisý,et al. Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games , 2015, Machine Learning.
[97] Ian A. Kash,et al. Combining No-regret and Q-learning , 2019, AAMAS.