A New Formalism, Method and Open Issues for Zero-Shot Coordination
暂无分享,去创建一个
Michael Dennis | Jakob Foerster | Caspar Oesterheld | Johannes Treutlein | Jakob N. Foerster | Johannes Treutlein | Michael Dennis | Caspar Oesterheld | J. Foerster
[1] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.
[2] I. Glicksberg. A FURTHER GENERALIZATION OF THE KAKUTANI FIXED POINT THEOREM, WITH APPLICATION TO NASH EQUILIBRIUM POINTS , 1952 .
[3] H. W. Kuhn,et al. 11. Extensive Games and the Problem of Information , 1953 .
[4] T. Schelling,et al. The Strategy of Conflict. , 1961 .
[5] George W. Polites,et al. An introduction to the theory of groups , 1968 .
[6] R. Kirk. CONVENTION: A PHILOSOPHICAL STUDY , 1970 .
[7] J. Harsanyi. The tracing procedure: A Bayesian approach to defining a solution forn-person noncooperative games , 1975 .
[8] John C. Harsanyi,et al. Общая теория выбора равновесия в играх / A General Theory of Equilibrium Selection in Games , 1989 .
[9] David Williams,et al. Probability with Martingales , 1991, Cambridge mathematical textbooks.
[10] Noga Alon,et al. The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.
[11] R. Gibbons. Game theory for applied economists , 1992 .
[12] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[13] R. Sugden,et al. The Nature of Salience: An Experimental Investigation of Pure Coordination Games , 1994 .
[14] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .
[15] Craig Boutilier,et al. Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.
[16] B. Peleg,et al. The canonical extensive form of a game form: symmetries , 1999 .
[17] Eric van Damme,et al. Non-Cooperative Games , 2000 .
[18] J. Pearl. Causality: Models, Reasoning and Inference , 2000 .
[19] P. Jean-Jacques Herings,et al. Computation of the Nash Equilibrium Selected by the Tracing Procedure in N-Person Games , 2002, Games Econ. Behav..
[20] P. Jean-Jacques Herings,et al. Equilibrium Selection in Stochastic Games , 2001, IGTR.
[21] David V. Pynadath,et al. Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.
[22] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[24] Claudia V. Goldman,et al. Learning to communicate in a decentralized environment , 2007, Autonomous Agents and Multi-Agent Systems.
[25] Frans A. Oliehoek,et al. Dec-POMDPs and extensive form games: equivalence of models and algorithms , 2006 .
[26] Nikos A. Vlassis,et al. Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..
[27] Sarit Kraus,et al. Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.
[28] Paul W. Goldberg,et al. The Complexity of the Homotopy Method, Equilibrium Selection, and Lemke-Howson Solutions , 2010, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.
[29] Sarit Kraus,et al. Empirical evaluation of ad hoc teamwork in the pursuit domain , 2011, AAMAS.
[30] Kee-Eung Kim,et al. Exploiting symmetries for single- and multi-agent Partially Observable Stochastic Domains , 2012, Artif. Intell..
[31] S. Zamir,et al. Game Theory: Behavior strategies and Kuhn's Theorem , 2013 .
[32] Peter Stone,et al. Cooperating with Unknown Teammates in Complex Domains: A Robot Soccer Case Study of Ad Hoc Teamwork , 2015, AAAI.
[33] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.
[34] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[35] Hoong Chuin Lau,et al. Policy Gradient With Value Function Approximation For Collective Multiagent Planning , 2018, NIPS.
[36] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[37] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[38] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.
[39] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.
[40] Alexander Peysakhovich,et al. Learning Existing Social Conventions via Observationally Augmented Self-Play , 2018, AIES.
[41] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[42] H. Francis Song,et al. Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning , 2018, ICML.
[43] Shimon Whiteson,et al. The StarCraft Multi-Agent Challenge , 2019, AAMAS.
[44] Anca D. Dragan,et al. On the Utility of Learning about Humans for Human-AI Coordination , 2019, NeurIPS.
[45] Julie Shah,et al. Adversarially Guided Self-Play for Adopting Social Conventions , 2020, ArXiv.
[46] Jakob N. Foerster,et al. "Other-Play" for Zero-Shot Coordination , 2020, ICML.
[47] Caspar Oesterheld,et al. Symmetry, Equilibria, and Robustness in Common-Payoff Games , 2021 .