暂无分享,去创建一个
Rémi Munos | Karl Tuyls | Daniel Hennes | Marc Lanctot | Shayegan Omidshafiei | Dustin Morrill | Julien Pérolat | Jean-Baptiste Lespiau | Audrunas Gruslys | R. Munos | Marc Lanctot | A. Gruslys | K. Tuyls | J. Pérolat | J. Lespiau | Shayegan Omidshafiei | Dustin Morrill | Daniel Hennes | Jean-Baptiste Lespiau
[1] Y. Mansour,et al. Algorithmic Game Theory: Learning, Regret Minimization, and Equilibria , 2007 .
[2] David Silver,et al. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.
[3] A. Rustichini. Optimal Properties of Stimulus-Response Learning Models* , 1999 .
[4] E. C. Zeeman,et al. Population dynamics from game theory , 1980 .
[5] Karl Tuyls,et al. Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent , 2019, IJCAI.
[6] Rahul Savani,et al. Negative Update Intervals in Deep Multi-Agent Reinforcement Learning , 2018, AAMAS.
[7] Neil Burch,et al. Time and Space: Why Imperfect Information Games are Hard , 2018 .
[8] Tuomas Sandholm,et al. Dynamic Thresholding and Pruning for Regret Minimization , 2017, AAAI.
[9] Yoav Shoham,et al. If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..
[10] James R. Wright,et al. Bounds for Approximate Regret-Matching Algorithms , 2019, ArXiv.
[11] Marcello Restelli,et al. Sequence-Form and Evolutionary Dynamics: Realization Equivalence to Agent Form and Logit Dynamics , 2016, AAAI.
[12] Éva Tardos,et al. Multiplicative updates outperform generic no-regret learning in congestion games: extended abstract , 2009, STOC '09.
[13] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.
[14] Michael H. Bowling,et al. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.
[15] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[16] Martin Schmid,et al. Revisiting CFR+ and Alternating Updates , 2018, J. Artif. Intell. Res..
[17] Shane Legg,et al. Symmetric Decomposition of Asymmetric Games , 2017, Scientific Reports.
[18] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[19] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[20] Michael H. Bowling,et al. Convergence and No-Regret in Multiagent Learning , 2004, NIPS.
[21] Marcello Restelli,et al. Evolutionary Dynamics of Q-Learning over the Sequence Form , 2014, AAAI.
[22] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.
[23] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[24] Guillaume J. Laurent,et al. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.
[25] Shane Legg,et al. DeepMind Lab , 2016, ArXiv.
[26] Yoram Singer,et al. A primal-dual perspective of online learning algorithms , 2007, Machine Learning.
[27] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[28] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.
[29] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.
[30] Michael Bowling,et al. Alternative Function Approximation Parameterizations for Solving Games: An Analysis of f-Regression Counterfactual Regret Minimization , 2020, AAMAS.
[31] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[32] Marc Lanctot,et al. Further developments of extensive-form replicator dynamics using the sequence-form representation , 2014, AAMAS.
[33] Tilman Börgers,et al. Learning Through Reinforcement and Replicator Dynamics , 1997 .
[34] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.
[35] Karl Tuyls,et al. alpha-Rank: Multi-Agent Evaluation by Evolution , 2019 .
[36] Michael H. Bowling,et al. Monte carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games , 2013 .
[37] Yishay Mansour,et al. Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.
[38] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[39] Angelia Nedic,et al. On Stochastic Subgradient Mirror-Descent Algorithm with Weighted Averaging , 2013, SIAM J. Optim..
[40] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[41] Gerhard Weiss,et al. Multiagent Learning: Basics, Challenges, and Prospects , 2012, AI Mag..
[42] Karl Tuyls,et al. Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..
[43] Alex Graves,et al. Automated Curriculum Learning for Neural Networks , 2017, ICML.
[44] Joel Z. Leibo,et al. A Generalised Method for Empirical Game Theoretic Analysis , 2018, AAMAS.
[45] Karl Tuyls,et al. An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.
[46] Pablo Hernandez-Leal,et al. A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.
[47] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.
[48] Matthias Rauterberg,et al. State-coupled replicator dynamics , 2009, AAMAS.
[49] R. Cressman,et al. Strong stability and evolutionarily stable strategies with two types of players , 1991 .
[50] Simon Parsons,et al. What evolutionary game theory tells us about multiagent learning , 2007, Artif. Intell..
[51] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[52] Noam Brown,et al. Superhuman AI for multiplayer poker , 2019, Science.
[53] Karl Tuyls,et al. Frequency adjusted multi-agent Q-learning , 2010, AAMAS.
[54] Marcello Restelli,et al. Efficient Evolutionary Dynamics with Extensive-Form Games , 2013, AAAI.
[55] Shai Shalev-Shwartz,et al. Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .
[56] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.
[57] Sherief Abdallah,et al. Addressing Environment Non-Stationarity by Repeating Q-learning Updates , 2016, J. Mach. Learn. Res..
[58] Rahul Savani,et al. Lenient Multi-Agent Deep Reinforcement Learning , 2017, AAMAS.
[59] Pieter Abbeel,et al. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.
[60] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.
[61] Tom Lenaerts,et al. An Evolutionary Game Theoretic Perspective on Learning in Multi-Agent Systems , 2004, Synthese.
[62] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.
[63] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[64] Christos H. Papadimitriou,et al. α-Rank: Multi-Agent Evaluation by Evolution , 2019, Scientific Reports.
[65] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[66] Josef Hofbauer,et al. Evolutionary Games and Population Dynamics , 1998 .
[67] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[68] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[69] M. Nowak,et al. Evolutionary game theory , 1995, Current Biology.
[70] Tuomas Sandholm,et al. Deep Counterfactual Regret Minimization , 2018, ICML.
[71] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[72] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[73] Martin Wattenberg,et al. Ad click prediction: a view from the trenches , 2013, KDD.
[74] Georgios Piliouras,et al. From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization , 2020, ICML.
[75] Karl Tuyls,et al. Evolutionary Dynamics of Regret Minimization , 2010, ECML/PKDD.
[76] Kevin Waugh,et al. Solving Games with Functional Regret Estimation , 2014, AAAI Workshop: Computer Poker and Imperfect Information.
[77] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..
[78] H. Brendan McMahan,et al. Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization , 2011, AISTATS.
[79] Jan Ramon,et al. An evolutionary game-theoretic analysis of poker strategies , 2009, Entertain. Comput..
[80] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[81] Daniel Friedman,et al. Evolutionary Games in Natural, Social, and Virtual Worlds , 2016 .
[82] Kevin Waugh,et al. Abstraction in Large Extensive Games , 2009 .
[83] J. M. Smith,et al. The Logic of Animal Conflict , 1973, Nature.
[84] P. Taylor,et al. Evolutionarily Stable Strategies and Game Dynamics , 1978 .
[85] Sriram Srinivasan,et al. OpenSpiel: A Framework for Reinforcement Learning in Games , 2019, ArXiv.
[86] H. Francis Song,et al. The Hanabi Challenge: A New Frontier for AI Research , 2019, Artif. Intell..
[87] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.
[88] Tom Lenaerts,et al. A selection-mutation model for q-learning in multi-agent systems , 2003, AAMAS '03.
[89] Christos H. Papadimitriou,et al. Cycles in adversarial regularized learning , 2017, SODA.
[90] E. Zeeman. Dynamics of the evolution of animal conflicts , 1981 .
[91] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[92] R. Cressman. Evolutionary Dynamics and Extensive Form Games , 2003 .
[93] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.
[94] Jörgen W. Weibull,et al. Evolutionary Game Theory , 1996 .
[95] Yuan Qi,et al. Double Neural Counterfactual Regret Minimization , 2018, ICLR.
[96] Josef Hofbauer,et al. Time Average Replicator and Best-Reply Dynamics , 2009, Math. Oper. Res..