Counterfactual Multi-Agent Policy Gradients
暂无分享,去创建一个
Shimon Whiteson | Triantafyllos Afouras | Jakob N. Foerster | Gregory Farquhar | Nantas Nardelli | S. Whiteson | Gregory Farquhar | Triantafyllos Afouras | Nantas Nardelli | Shimon Whiteson
[1] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[2] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[3] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[4] Shigenobu Kobayashi,et al. An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.
[5] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[6] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[7] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[8] Kagan Tumer,et al. Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..
[9] Leslie Pack Kaelbling,et al. All learning is Local: Multi-agent Learning in Global Reward Games , 2003, NIPS.
[10] Erfu Yang,et al. Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey , 2004 .
[11] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[12] Wang Ying,et al. Multi-agent framework for third party logistics in E-commerce , 2005, Expert Syst. Appl..
[13] Danny Weyns,et al. The Packet-World: A Test Bed for Investigating Situated Multi-Agent Systems , 2005 .
[14] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[15] Kagan Tumer,et al. Distributed agent-based air traffic flow management , 2007, AAMAS '07.
[16] Nikos A. Vlassis,et al. Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..
[17] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[18] Yoav Shoham,et al. Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .
[19] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.
[20] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.
[21] Kagan Tumer,et al. Modeling difference rewards for multiagent learning , 2012, AAMAS.
[22] Wenwu Yu,et al. An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination , 2012, IEEE Transactions on Industrial Informatics.
[23] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[24] Kevin Leyton-Brown,et al. Empirically Evaluating Multiagent Learning Algorithms , 2014, ArXiv.
[25] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[26] Kagan Tumer,et al. Approximating Difference Evaluations with Local Information , 2015, AAMAS.
[27] Yun Yang,et al. A Multi-Agent Framework for Packet Routing in Wireless Sensor Networks , 2015, Sensors.
[28] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[29] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.
[30] Bikramjit Banerjee,et al. Multi-agent reinforcement learning as a rehearsal for decentralized planning , 2016, Neurocomputing.
[31] Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.
[32] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[33] Florian Richoux,et al. TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games , 2016, ArXiv.
[34] Nicolas Usunier,et al. Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.
[35] Emil Gustavsson,et al. Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence , 2016, ArXiv.
[36] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.
[37] Mykel J. Kochenderfer,et al. Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.
[38] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[39] Peng Peng,et al. Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.
[40] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[41] Dorian Kodelja,et al. Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.
[42] Jun Wang,et al. Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.
[43] Jonathan P. How,et al. Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.
[44] Jonathan P. How,et al. Deep Decentralized Multi-task Multi-Agent RL under Partial Observability , 2017 .
[45] Stefan Lee,et al. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[46] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.
[47] Alexander Peysakhovich,et al. Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.
[48] Pieter Abbeel,et al. Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.