A Self-Tuning Actor-Critic Algorithm
暂无分享,去创建一个
Junhyuk Oh | Satinder Singh | David Silver | Matteo Hessel | Zhongwen Xu | Hado van Hasselt | Vivek Veeriah | Tom Zahavy
[1] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[2] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[3] Paolo Frasconi,et al. Forward and Reverse Gradient-Based Hyperparameter Optimization , 2017, ICML.
[4] Will Dabney,et al. Adaptive Trade-Offs in Off-Policy Learning , 2020, AISTATS.
[5] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[6] Krzysztof Choromanski,et al. Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies , 2020, ArXiv.
[7] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[8] Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.
[9] Martha White,et al. A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning , 2016, AAMAS.
[10] Shimon Whiteson,et al. Fast Efficient Hyperparameter Tuning for Policy Gradient Methods , 2019, NeurIPS.
[11] Yoshua Bengio,et al. Hyperbolic Discounting and Learning over Multiple Horizons , 2019, ArXiv.
[12] Shimon Whiteson,et al. Fast Efficient Hyperparameter Tuning for Policy Gradients , 2019, NeurIPS.
[13] Tianqi Chen,et al. Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.
[14] Karen Simonyan,et al. Off-Policy Actor-Critic with Shared Experience Replay , 2020, ICML.
[15] Shie Mannor,et al. Adaptive Lambda Least-Squares Temporal Difference Learning , 2016, 1612.09465.
[16] Nir Levine,et al. An empirical investigation of the challenges of real-world reinforcement learning , 2020, ArXiv.
[17] Matthew E. Taylor,et al. Metatrace Actor-Critic: Online Step-Size Tuning by Meta-gradient Descent for Reinforcement Learning Control , 2018, IJCAI.
[18] Richard L. Lewis,et al. Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.
[19] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.
[20] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..
[21] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[22] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[23] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[24] Tom Schaul,et al. Adapting Behaviour for Learning Progress , 2019, ArXiv.
[25] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[26] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[27] Fabian Pedregosa,et al. Hyperparameter optimization with approximate gradient , 2016, ICML.