暂无分享,去创建一个
Zhen Xiao | Hangyu Mao | Zhengchao Zhang | Zhibo Gong | Zhen Xiao | Hangyu Mao | Zhibo Gong | Zhengchao Zhang
[1] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[2] Srikanth Kandula,et al. Walking the tightrope: responsive yet stable traffic engineering , 2005, SIGCOMM '05.
[3] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[4] Monia Ghobadi,et al. Efficient traffic splitting on commodity switches , 2015, CoNEXT.
[5] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[6] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[7] Peter Stone,et al. Autonomous agents modelling other agents: A comprehensive survey and open problems , 2017, Artif. Intell..
[8] Jonathan Schaeffer,et al. Opponent Modeling in Poker , 1998, AAAI/IAAI.
[9] Robert Babuska,et al. A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[10] Nikos A. Vlassis,et al. Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..
[11] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[12] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[13] J. Rexford,et al. Efficient Traffic Splitting on SDN Switches , 2015 .
[14] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.
[15] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[16] Lex Weaver,et al. A Multi-Agent Policy-Gradient Approach to Network Routing , 2001, ICML.
[17] David Fridovich-Keil,et al. Fully Decentralized Policies for Multi-Agent Systems: An Information Theoretic Approach , 2017, NIPS.
[18] Xiangxiang Chu,et al. Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning , 2017, ArXiv.
[19] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[20] Tuomas Sandholm,et al. Game theory-based opponent modeling in large imperfect-information games , 2011, AAMAS.
[21] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[22] Jordan L. Boyd-Graber,et al. Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.
[23] Rob Fergus,et al. Modeling Others using Oneself in Multi-Agent Reinforcement Learning , 2018, ICML.
[24] Victor R. Lesser,et al. Coordinating multi-agent reinforcement learning with limited communication , 2013, AAMAS.
[25] Mykel J. Kochenderfer,et al. Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.
[26] Manuela M. Veloso,et al. Reasoning about joint beliefs for execution-time communication decisions , 2005, AAMAS '05.
[27] Sandip Sen,et al. Reaching pareto-optimality in prisoner’s dilemma using conditional joint action learning , 2007, Autonomous Agents and Multi-Agent Systems.
[28] Jon Crowcroft,et al. TCP-like congestion control for layered multicast data transfer , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.
[29] Pablo Hernandez-Leal,et al. A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.
[30] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[31] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.
[32] Min Zhu,et al. WCMP: weighted cost multipathing for improved fairness in data centers , 2014, EuroSys '14.
[33] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.
[34] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[35] Zhang-Wei Hong,et al. A Deep Policy Inference Q-Network for Multi-Agent Systems , 2017, AAMAS.
[36] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[37] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[38] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[39] Jeffrey S. Rosenschein,et al. Best-response multiagent learning in non-stationary environments , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..
[40] Yuxi Li,et al. Deep Reinforcement Learning: An Overview , 2017, ArXiv.
[41] Ming Zhou,et al. Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.
[42] Peter Stone,et al. A Multiagent Approach to Autonomous Intersection Management , 2008, J. Artif. Intell. Res..
[43] Xiangyu Liu,et al. ACCNet: Actor-Coordinator-Critic Net for "Learning-to-Communicate" with Deep Multi-agent Reinforcement Learning , 2017, ArXiv.
[44] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.