Learning Expensive Coordination: An Event-Based Deep RL Approach
暂无分享,去创建一个
Hanjiang Lai | Xinrun Wang | Bo An | Youzhi Zhang | Zhenyu Shi | Rundong Wang | Runsheng Yu | Hanjiang Lai | Xinrun Wang | R. Wang | Runsheng Yu | Bo An | Y. Zhang | Zhenyu Shi
[1] V. Borkar. Stochastic approximation with two time scales , 1997 .
[2] Régis Sabbadin,et al. A Tractable Leader-Follower MDP Model for Animal Disease Management , 2013, AAAI.
[3] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[4] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[5] Fei Sha,et al. Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.
[6] Yan Hong,et al. Reinforcement Mechanism Design, with Applications to Dynamic Pricing in Sponsored Search Auctions , 2017, ArXiv.
[7] Jordan L. Boyd-Graber,et al. Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.
[8] Pingzhong Tang,et al. Reinforcement mechanism design , 2017, IJCAI.
[9] George J. Pappas,et al. Taxi Dispatch With Real-Time Sensing Data in Metropolitan Areas: A Receding Horizon Control Approach , 2015, IEEE Transactions on Automation Science and Engineering.
[10] Sergio Valcarcel Macua,et al. Coordinating the Crowd: Inducing Desirable Equilibria in Non-Cooperative Systems , 2019, AAMAS.
[11] Alex Graves,et al. Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.
[12] B. Chaib-draa,et al. Multiagent Q-Learning : Preliminary Study on Dominance between the Nash and Stackelberg Equilibriums , 2005 .
[13] Ron Lavi,et al. Algorithmic Mechanism Design , 2008, Encyclopedia of Algorithms.
[14] Joel Z. Leibo,et al. A Generalised Method for Empirical Game Theoretic Analysis , 2018, AAMAS.
[15] Utkarsh Upadhyay,et al. Deep Reinforcement Learning of Marked Temporal Point Processes , 2018, NeurIPS.
[16] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[17] Yan Zheng,et al. A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents , 2018, NeurIPS.
[18] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.
[19] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[20] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[21] Alexandre Alahi,et al. Crowd-Robot Interaction: Crowd-Aware Robot Navigation With Attention-Based Deep Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[22] Claudia V. Goldman,et al. Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..
[23] Noam Brown,et al. Superhuman AI for multiplayer poker , 2019, Science.
[24] Chi Cheng,et al. A multi-agent reinforcement learning algorithm based on Stackelberg game , 2017, 2017 6th Data Driven Control and Learning Systems (DDCLS).
[25] H. Francis Song,et al. Machine Theory of Mind , 2018, ICML.
[26] Philip S. Thomas,et al. Learning Action Representations for Reinforcement Learning , 2019, ICML.
[27] S. Bhattacharyya,et al. Leader-Follower semi-Markov Decision Problems: Theoretical Framework and Approximate Solution , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[28] Shimon Whiteson,et al. DAC: The Double Actor-Critic Architecture for Learning Options , 2019, NeurIPS.
[29] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.
[30] Joelle Pineau,et al. An Inference-Based Policy Gradient Method for Learning Options , 2018, ICML.
[31] Luciano Messori. The Theory of Incentives I: The Principal-Agent Model , 2013 .
[32] Régis Sabbadin,et al. Leader-Follower MDP Models with Factored State Space and Many Followers - Followers Abstraction, Structured Dynamics and State Aggregation , 2016, ECAI.
[33] Lillian J. Ratliff,et al. Convergence of Learning Dynamics in Stackelberg Games , 2019, ArXiv.
[34] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.
[35] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[36] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[37] Akshat Kumar,et al. Planning and Learning for Decentralized MDPs With Event Driven Rewards , 2018, AAAI.
[38] Nicolas Le Roux,et al. The Value Function Polytope in Reinforcement Learning , 2019, ICML.
[39] Jan Peters,et al. Probabilistic inference for determining options in reinforcement learning , 2016, Machine Learning.
[40] Alan Fern,et al. Learning and Transferring Roles in Multi-Agent Reinforcement , 2008 .