A Self-Tuning Actor-Critic Algorithm
暂无分享,去创建一个
Junhyuk Oh | D. Silver | Satinder Singh | Matteo Hessel | H. V. Hasselt | Zhongwen Xu | Vivek Veeriah | Tom Zahavy | David Silver
[1] Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.
[2] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..
[3] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .
[4] Tianqi Chen,et al. Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.
[5] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[6] Shie Mannor,et al. Adaptive Lambda Least-Squares Temporal Difference Learning , 2016, 1612.09465.
[7] Martha White,et al. A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning , 2016, AAMAS.
[8] Fabian Pedregosa,et al. Hyperparameter optimization with approximate gradient , 2016, ICML.
[9] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[10] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[11] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[12] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[13] Paolo Frasconi,et al. Forward and Reverse Gradient-Based Hyperparameter Optimization , 2017, ICML.
[14] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[15] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[16] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[17] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[18] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.
[19] Shimon Whiteson,et al. Fast Efficient Hyperparameter Tuning for Policy Gradient Methods , 2019, NeurIPS.
[20] Tom Schaul,et al. Adapting Behaviour for Learning Progress , 2019, ArXiv.
[21] Richard L. Lewis,et al. Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.
[22] Shimon Whiteson,et al. Fast Efficient Hyperparameter Tuning for Policy Gradients , 2019, NeurIPS.
[23] Matthew E. Taylor,et al. Metatrace Actor-Critic: Online Step-Size Tuning by Meta-gradient Descent for Reinforcement Learning Control , 2018, IJCAI.
[24] Yoshua Bengio,et al. Hyperbolic Discounting and Learning over Multiple Horizons , 2019, ArXiv.
[25] R. Munos,et al. Adaptive Trade-Offs in Off-Policy Learning , 2019, AISTATS.
[26] D. Mankowitz,et al. An empirical investigation of the challenges of real-world reinforcement learning , 2020, ArXiv.
[27] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[28] Krzysztof Choromanski,et al. Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies , 2020, ArXiv.
[29] Off-Policy Actor-Critic with Shared Experience Replay , 2019, ICML.