暂无分享,去创建一个
Guy Lever | Joel Z. Leibo | Max Jaderberg | Wojciech Czarnecki | Peter Sunehag | Thore Graepel | Karl Tuyls | Marc Lanctot | Audrunas Gruslys | Vinícius Flores Zambaldi | Nicolas Sonnerat | Wojciech M. Czarnecki | Max Jaderberg | V. Zambaldi | Guy Lever | T. Graepel | Marc Lanctot | A. Gruslys | K. Tuyls | Nicolas Sonnerat | Peter Sunehag | P. Sunehag
[1] Shimon Whiteson,et al. Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs , 2008, ECML/PKDD.
[2] Doina Precup,et al. Investigating Recurrence and Eligibility Traces in Deep Q-Networks , 2017, ArXiv.
[3] Sam Devlin,et al. Potential-based difference rewards for multiagent reinforcement learning , 2014, AAMAS.
[4] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.
[5] Kagan Tumer,et al. Modeling difference rewards for multiagent learning , 2012, AAMAS.
[6] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.
[7] N. Le Fort-Piat,et al. The world of independent learners is not markovian , 2011, Int. J. Knowl. Based Intell. Eng. Syst..
[8] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[9] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[10] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.
[11] Jen Jen Chung,et al. Local Approximation of Difference Evaluation Functions , 2016, AAMAS.
[12] Peter Stone,et al. Reinforcement learning , 2019, Scholarpedia.
[13] Kagan Tumer,et al. Combining reward shaping and hierarchies for scaling to large multiagent systems , 2016, The Knowledge Engineering Review.
[14] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.
[15] Frans A. Oliehoek,et al. Coordinated Deep Reinforcement Learners for Traffic Light Control , 2016 .
[16] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[17] Kagan Tumer,et al. Modeling Difference Rewards for Multiagent Learning (Extended Abstract) , 2012 .
[18] Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.
[19] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[20] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.
[21] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.
[22] Gerhard Weiss,et al. Multiagent Learning: Basics, Challenges, and Prospects , 2012, AI Mag..
[23] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[24] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[25] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[26] Kagan Tumer,et al. A Survey of Collectives , 2004 .
[27] Andrew W. Moore,et al. Distributed Value Functions , 1999, ICML.
[28] Sam Devlin,et al. Potential-based reward shaping for finite horizon online POMDP planning , 2015, Autonomous Agents and Multi-Agent Systems.
[29] Michael L. Littman,et al. Social reward shaping in the prisoner's dilemma , 2008, AAMAS.
[30] Stuart J. Russell,et al. Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.
[31] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[32] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[33] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..
[34] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[35] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[36] Matthew J. Hausknecht,et al. Cooperation and communication in multiagent deep reinforcement learning , 2016 .
[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[38] Nikos A. Vlassis,et al. Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..
[39] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.
[40] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.
[41] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[42] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.
[43] Kagan Tumer,et al. Analyzing and visualizing multiagent rewards in dynamic and stochastic domains , 2008, Autonomous Agents and Multi-Agent Systems.