Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients
暂无分享,去创建一个
Daniel Hennes | Marc Lanctot | Karl Tuyls | Edgar A. Duéñez-Guzmán | Shayegan Omidshafiei | Dustin Morrill | Jean-Baptiste Lespiau | Paavo Parmas | Rémi Munos | Julien Perolat | Audrunas Gruslys | Edgar Duéñez-Guzmán | R. Munos | Marc Lanctot | A. Gruslys | K. Tuyls | J. Pérolat | J. Lespiau | Shayegan Omidshafiei | Dustin Morrill | Daniel Hennes | Paavo Parmas
[1] A. Rustichini. Optimal Properties of Stimulus-Response Learning Models* , 1999 .
[2] Karl Tuyls,et al. Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent , 2019, IJCAI.
[3] Neil Burch,et al. Time and Space: Why Imperfect Information Games are Hard , 2018 .
[4] R. Cressman. Evolutionary Dynamics and Extensive Form Games , 2003 .
[5] Pieter Abbeel,et al. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.
[6] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[7] R. Cressman,et al. Strong stability and evolutionarily stable strategies with two types of players , 1991 .
[8] Michael H. Bowling,et al. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.
[9] Noam Brown,et al. Superhuman AI for multiplayer poker , 2019, Science.
[10] Guillaume J. Laurent,et al. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.
[11] P. Taylor,et al. Evolutionarily Stable Strategies and Game Dynamics , 1978 .
[12] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.
[13] E. C. Zeeman,et al. Population dynamics from game theory , 1980 .
[14] Marc Lanctot,et al. Further developments of extensive-form replicator dynamics using the sequence-form representation , 2014, AAMAS.
[15] Marcello Restelli,et al. Sequence-Form and Evolutionary Dynamics: Realization Equivalence to Agent Form and Logit Dynamics , 2016, AAAI.
[16] Éva Tardos,et al. Multiplicative updates outperform generic no-regret learning in congestion games: extended abstract , 2009, STOC '09.
[17] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[18] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.
[19] Tom Lenaerts,et al. A selection-mutation model for q-learning in multi-agent systems , 2003, AAMAS '03.
[20] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.
[21] Christos H. Papadimitriou,et al. Cycles in adversarial regularized learning , 2017, SODA.
[22] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[23] E. Zeeman. Dynamics of the evolution of animal conflicts , 1981 .
[24] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[25] Sriram Srinivasan,et al. OpenSpiel: A Framework for Reinforcement Learning in Games , 2019, ArXiv.
[26] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[27] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.
[28] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[29] Yoram Singer,et al. A primal-dual perspective of online learning algorithms , 2007, Machine Learning.
[30] Tuomas Sandholm,et al. Deep Counterfactual Regret Minimization , 2018, ICML.
[31] Angelia Nedic,et al. On Stochastic Subgradient Mirror-Descent Algorithm with Weighted Averaging , 2013, SIAM J. Optim..
[32] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[33] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[34] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[35] Martin Wattenberg,et al. Ad click prediction: a view from the trenches , 2013, KDD.
[36] Georgios Piliouras,et al. From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization , 2020, ICML.
[37] H. Francis Song,et al. The Hanabi Challenge: A New Frontier for AI Research , 2019, Artif. Intell..
[38] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.
[39] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.
[40] Karl Tuyls,et al. Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..
[41] Alex Graves,et al. Automated Curriculum Learning for Neural Networks , 2017, ICML.
[42] Karl Tuyls,et al. Frequency adjusted multi-agent Q-learning , 2010, AAMAS.
[43] Shai Shalev-Shwartz,et al. Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .
[44] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.
[45] Tom Lenaerts,et al. An Evolutionary Game Theoretic Perspective on Learning in Multi-Agent Systems , 2004, Synthese.
[46] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[47] Martin Schmid,et al. Revisiting CFR+ and Alternating Updates , 2018, J. Artif. Intell. Res..
[48] David Silver,et al. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.
[49] D. M. V. Hesteren. Evolutionary Game Theory , 2017 .
[50] Michael H. Bowling,et al. Monte carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games , 2013 .
[51] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[52] H. Brendan McMahan,et al. Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization , 2011, AISTATS.
[53] Jan Ramon,et al. An evolutionary game-theoretic analysis of poker strategies , 2009, Entertain. Comput..
[54] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[55] Christos H. Papadimitriou,et al. α-Rank: Multi-Agent Evaluation by Evolution , 2019, Scientific Reports.
[56] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[57] Josef Hofbauer,et al. Evolutionary Games and Population Dynamics , 1998 .
[58] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.
[59] John E. R. Staddon,et al. The dynamics of behavior: Review of Sutton and Barto: Reinforcement Learning : An Introduction (2 nd ed.) , 2020 .
[60] Yuan Qi,et al. Double Neural Counterfactual Regret Minimization , 2018, ICLR.
[61] Josef Hofbauer,et al. Time Average Replicator and Best-Reply Dynamics , 2009, Math. Oper. Res..
[62] Daniel Friedman,et al. Evolutionary Games in Natural, Social, and Virtual Worlds , 2016 .
[63] Shane Legg,et al. Symmetric Decomposition of Asymmetric Games , 2017, Scientific Reports.
[64] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[65] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[66] Marcello Restelli,et al. Evolutionary Dynamics of Q-Learning over the Sequence Form , 2014, AAAI.
[67] Michael Bowling,et al. Alternative Function Approximation Parameterizations for Solving Games: An Analysis of f-Regression Counterfactual Regret Minimization , 2020, AAMAS.
[68] Marcello Restelli,et al. Efficient Evolutionary Dynamics with Extensive-Form Games , 2013, AAAI.
[69] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.
[70] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[71] Tuomas Sandholm,et al. Dynamic Thresholding and Pruning for Regret Minimization , 2017, AAAI.
[72] Kevin Waugh,et al. Solving Games with Functional Regret Estimation , 2014, AAAI Workshop: Computer Poker and Imperfect Information.
[73] Gerhard Weiss,et al. Multiagent Learning: Basics, Challenges, and Prospects , 2012, AI Mag..
[74] Joel Z. Leibo,et al. A Generalised Method for Empirical Game Theoretic Analysis , 2018, AAMAS.