暂无分享,去创建一个
[1] Michael I. Jordan,et al. RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.
[2] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[3] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[4] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[5] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[6] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[7] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[8] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[9] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.
[10] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[11] Navdeep Jaitly,et al. Discrete Sequential Prediction of Continuous Actions for Deep RL , 2017, ArXiv.
[12] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[13] Andriy Mnih,et al. Q-Learning in enormous action spaces via amortized approximate maximization , 2020, ArXiv.
[14] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[15] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[16] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[17] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[18] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[19] Arash Tavakoli,et al. Action Branching Architectures for Deep Reinforcement Learning , 2017, AAAI.
[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[21] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.
[22] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[23] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.
[24] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[25] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[26] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[27] Andrea Bonarini,et al. MushroomRL: Simplifying Reinforcement Learning Research , 2020, J. Mach. Learn. Res..
[28] Vitaly Levdik,et al. Time Limits in Reinforcement Learning , 2017, ICML.
[29] Ion Stoica,et al. Ray RLLib: A Composable and Scalable Reinforcement Learning Library , 2017, NIPS 2017.
[30] Rousslan Fernand Julien Dossa,et al. CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms , 2021, J. Mach. Learn. Res..
[31] Sergio Gomez Colmenarejo,et al. Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.
[32] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[33] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[34] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[35] Pieter Abbeel,et al. rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch , 2019, ArXiv.
[36] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[37] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[38] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..