Improved Cooperative Multi-agent Reinforcement Learning Algorithm Augmented by Mixing Demonstrations from Centralized Policy
暂无分享,去创建一个
[1] Shimon Whiteson,et al. Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks , 2016, ArXiv.
[2] Michael L. Littman,et al. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.
[3] Matthew E. Taylor,et al. Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL , 2018, ArXiv.
[4] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.
[5] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[6] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[7] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.
[8] Craig Boutilier,et al. Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.
[9] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[10] Goele Pipeleers,et al. Flexible Multi-Agent System for Distributed Coordination, Transportation & Localisation , 2018, AAMAS.
[11] François Charpillet,et al. Mixed Integer Linear Programming for Exact Finite-Horizon Planning in Decentralized Pomdps , 2007, ICAPS.
[12] Olivier Buffet,et al. Optimally Solving Dec-POMDPs as Continuous-State MDPs , 2013, IJCAI.
[13] Makoto Yokoo,et al. Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.
[14] Jonathan P. How,et al. Learning to Teach in Cooperative Multiagent Reinforcement Learning , 2018, AAAI.
[15] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[16] Shimon Whiteson,et al. Multi-Agent Common Knowledge Reinforcement Learning , 2018, NeurIPS.
[17] Bikramjit Banerjee,et al. Multi-agent reinforcement learning as a rehearsal for decentralized planning , 2016, Neurocomputing.
[18] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[19] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.
[20] Huimin Ma,et al. Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations , 2018, ArXiv.
[21] Shlomo Zilberstein,et al. Achieving goals in decentralized POMDPs , 2009, AAMAS.
[22] Nikos A. Vlassis,et al. Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..
[23] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[24] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[25] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[26] M. Veloso,et al. Multiagent Collaborative Task Learning through Imitation , 2007 .
[27] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[29] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.
[30] Byron Boots,et al. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.
[31] François Charpillet,et al. MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.
[32] Jiashi Feng,et al. Policy Optimization with Demonstrations , 2018, ICML.
[33] Andrea Lockerd Thomaz,et al. Exploration from Demonstration for Interactive Reinforcement Learning , 2016, AAMAS.
[34] Jonathan P. How,et al. Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.
[35] Shlomo Zilberstein,et al. Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs , 2007, UAI.
[36] Yisong Yue,et al. Generative Multi-Agent Behavioral Cloning , 2018, ArXiv.
[37] Frans A. Oliehoek,et al. Sufficient Plan-Time Statistics for Decentralized POMDPs , 2013, IJCAI.
[38] Dorian Kodelja,et al. Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.
[39] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.
[40] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[41] Mykel J. Kochenderfer,et al. Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.
[42] Shlomo Zilberstein,et al. Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.
[43] Steven Reece,et al. Human–agent collaboration for disaster response , 2015, Autonomous Agents and Multi-Agent Systems.
[44] Frans A. Oliehoek,et al. The MADP Toolbox: An Open-Source Library for Planning and Learning in (Multi-)Agent Systems , 2015, AAAI Fall Symposia.
[45] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[46] Nezih Altay,et al. OR/MS research in disaster operations management , 2006, Eur. J. Oper. Res..
[47] Yisong Yue,et al. Coordinated Multi-Agent Imitation Learning , 2017, ICML.
[48] Daqiang Zhang,et al. Towards smart factory for industry 4.0: a self-organized multi-agent system with big data based feedback and coordination , 2016, Comput. Networks.