暂无分享,去创建一个
Junhyuk Oh | Satinder Singh | David Silver | Matteo Hessel | Zhongwen Xu | Hado van Hasselt | Vivek Veeriah | Tom Zahavy
[1] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.
[2] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[3] Ryan P. Adams,et al. Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.
[4] Tom Schaul,et al. Adapting Behaviour for Learning Progress , 2019, ArXiv.
[5] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[6] Fabian Pedregosa,et al. Hyperparameter optimization with approximate gradient , 2016, ICML.
[7] Will Dabney,et al. Adaptive Trade-Offs in Off-Policy Learning , 2020, AISTATS.
[8] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[9] Yoshua Bengio,et al. Hyperbolic Discounting and Learning over Multiple Horizons , 2019, ArXiv.
[10] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[11] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[12] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[13] Martha White,et al. A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning , 2016, AAMAS.
[14] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[15] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[16] Shie Mannor,et al. Adaptive Lambda Least-Squares Temporal Difference Learning , 2016, 1612.09465.
[17] Paolo Frasconi,et al. Forward and Reverse Gradient-Based Hyperparameter Optimization , 2017, ICML.
[18] Tianqi Chen,et al. Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.
[19] Richard L. Lewis,et al. Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.
[20] Matthew E. Taylor,et al. Metatrace: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control , 2018, ArXiv.
[21] Karen Simonyan,et al. Off-Policy Actor-Critic with Shared Experience Replay , 2020, ICML.
[22] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[23] Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.
[24] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .