暂无分享,去创建一个
[1] Harm van Seijen,et al. Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning , 2019, NeurIPS.
[2] R. Mazo. On the theory of brownian motion , 1973 .
[3] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[4] Richard Evans,et al. Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.
[5] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[6] Shimon Whiteson,et al. Growing Action Spaces , 2019, ICML.
[7] Mohan S. Kankanhalli,et al. Inferring DQN structure for high-dimensional continuous control , 2020, ICML.
[8] Chongjie Zhang,et al. Towards Understanding Linear Value Decomposition in Cooperative Multi-Agent Q-Learning , 2020, ArXiv.
[9] Philip S. Thomas,et al. Learning Action Representations for Reinforcement Learning , 2019, ICML.
[10] Georg Ostrovski,et al. Temporally-Extended ε-Greedy Exploration , 2020, ICLR.
[11] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[12] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[13] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[14] Claude Berge,et al. Hypergraphs - combinatorics of finite sets , 1989, North-Holland mathematical library.
[15] Shie Mannor,et al. Distributional Policy Optimization: An Alternative Approach for Continuous Control , 2019, NeurIPS.
[16] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.
[17] K. Jarrod Millman,et al. Array programming with NumPy , 2020, Nat..
[18] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[19] Marco Wiering,et al. Using continuous action spaces to solve discrete problems , 2009, 2009 International Joint Conference on Neural Networks.
[20] Razvan Pascanu,et al. Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.
[21] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[22] Shimon Whiteson,et al. The Representational Capacity of Action-Value Networks for Multi-Agent Reinforcement Learning , 2019, AAMAS.
[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[24] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[25] Shimon Whiteson,et al. My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control , 2020, ICLR.
[26] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[27] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[28] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[29] Andriy Mnih,et al. Q-Learning in enormous action spaces via amortized approximate maximization , 2020, ArXiv.
[30] S. Whiteson,et al. Deep Coordination Graphs , 2019, ICML.
[31] Navdeep Jaitly,et al. Discrete Sequential Prediction of Continuous Actions for Deep RL , 2017, ArXiv.
[32] Arash Tavakoli,et al. Action Branching Architectures for Deep Reinforcement Learning , 2017, AAAI.
[33] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[34] Sanja Fidler,et al. NerveNet: Learning Structured Policy with Graph Neural Networks , 2018, ICLR.
[35] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[36] C. Watkins. Learning from delayed rewards , 1989 .
[37] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[38] Balaraman Ravindran,et al. Learning to Factor Policies and Action-Value Functions: Factored Action Space Representations for Deep Reinforcement learning , 2017, ArXiv.
[39] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.
[40] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents (Extended Abstract) , 2018, IJCAI.
[41] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.
[42] Alexei A. Efros,et al. Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity , 2019, NeurIPS.
[43] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[44] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[45] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.
[46] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[47] Vitaly Levdik,et al. Time Limits in Reinforcement Learning , 2017, ICML.
[48] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[49] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[50] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.
[51] Wenlong Huang,et al. One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control , 2020, ICML.