Policy Gradient With Value Function Approximation For Collective Multiagent Planning
暂无分享,去创建一个
[1] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[2] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[3] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[4] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.
[5] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.
[6] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.
[7] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[8] Claudia V. Goldman,et al. Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..
[9] Victor R. Lesser,et al. Decentralized Markov decision processes with event-driven interactions , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..
[10] Makoto Yokoo,et al. Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.
[11] Andrew Y. Ng,et al. On Local Rewards and Scaling Distributed Reinforcement Learning , 2005, NIPS.
[12] Douglas Aberdeen,et al. Policy-Gradient Methods for Planning , 2005, NIPS.
[13] Andreas S. Schulz,et al. The Complexity of Congestion Games , 2008 .
[14] Edmund H. Durfee,et al. Influence-Based Policy Abstraction for Weakly-Coupled Dec-POMDPs , 2010, ICAPS.
[15] Marc Toussaint,et al. Scalable Multiagent Planning Using Probabilistic Inference , 2011, IJCAI.
[16] Shih-Fen Cheng,et al. Decision Support for Agent Populations in Uncertain and Congested Environments , 2012, AAAI.
[17] Hari Balakrishnan,et al. TCP ex machina: computer-generated congestion control , 2013, SIGCOMM.
[18] Ari Hottinen,et al. Optimizing Spatial and Temporal Reuse in Wireless Networks by Decentralized Partially Observable Markov Decision Processes , 2014, IEEE Transactions on Mobile Computing.
[19] Patrick Jaillet,et al. Decentralized Stochastic Planning with Anonymity in Interactions , 2014, AAAI.
[20] Yingke Chen,et al. Individual Planning in Agent Populations: Exploiting Anonymity and Frame-Action Hypergraphs , 2015, ICAPS.
[21] Jonathan P. How,et al. Planning for decentralized control of multiple robots under uncertainty , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[22] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[23] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[24] Marc Toussaint,et al. Probabilistic Inference Techniques for Scalable Multiagent Decision Making , 2015, J. Artif. Intell. Res..
[25] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[26] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.
[27] David Silver,et al. Learning values across many orders of magnitude , 2016, NIPS.
[28] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[29] Mykel J. Kochenderfer,et al. Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs , 2015, AAAI.
[30] Hoong Chuin Lau,et al. Collective Multiagent Sequential Decision Making Under Uncertainty , 2017, AAAI.
[31] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.