暂无分享,去创建一个
Junhyuk Oh | Satinder Singh | David Silver | Matteo Hessel | Zhongwen Xu | Hado van Hasselt | Wojciech M. Czarnecki | Wojciech M. Czarnecki | Junhyuk Oh | Satinder Singh | Matteo Hessel | H. V. Hasselt | Zhongwen Xu | David Silver
[1] Louis Wehenkel,et al. Policy Search in a Space of Simple Closed-form Formulas: Towards Interpretability of Reinforcement Learning , 2012, Discovery Science.
[2] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[3] Sergey Levine,et al. Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.
[4] Junhyuk Oh,et al. Meta-Gradient Reinforcement Learning with an Objective Discovered Online , 2020, NeurIPS.
[5] Yoshua Bengio,et al. Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.
[6] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[7] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[8] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[9] Louis Wehenkel,et al. Automatic Discovery of Ranking Formulas for Playing with Multi-armed Bandits , 2011, EWRL.
[10] Leslie Pack Kaelbling,et al. Meta-learning curiosity algorithms , 2020, ICLR.
[11] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[12] Sebastian Thrun,et al. Learning One More Thing , 1994, IJCAI.
[13] Razvan Pascanu,et al. Meta-Learning with Warped Gradient Descent , 2020, ICLR.
[14] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[15] Junhyuk Oh,et al. What Can Learned Intrinsic Rewards Capture? , 2019, ICML.
[16] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[17] Tor Lattimore,et al. Behaviour Suite for Reinforcement Learning , 2019, ICLR.
[18] Jürgen Schmidhuber,et al. A ‘Self-Referential’ Weight Matrix , 1993 .
[19] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[20] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.
[21] Tie-Yan Liu,et al. Beyond Exponentially Discounted Sum: Automatic Learning of Return Function , 2019, ArXiv.
[22] Junhyuk Oh,et al. A Self-Tuning Actor-Critic Algorithm , 2020, NeurIPS.
[23] Kenneth O. Stanley,et al. Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity , 2018, ICLR.
[24] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.
[25] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[26] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.
[27] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[28] J. Schulman,et al. Reptile: a Scalable Metalearning Algorithm , 2018 .
[29] Risto Miikkulainen,et al. A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.
[30] Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.
[31] Pieter Abbeel,et al. Evolved Policy Gradients , 2018, NeurIPS.
[32] Louis Kirsch,et al. Improving Generalization in Meta Reinforcement Learning using Learned Objectives , 2020, ICLR.
[33] Yevgen Chebotar,et al. Meta Learning via Learned Loss , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).
[34] Wei Zhou,et al. Online Meta-Critic Learning for Off-Policy Actor-Critic Methods , 2020, NeurIPS.
[35] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[36] Kenneth O. Stanley,et al. Differentiable plasticity: training plastic neural networks with backpropagation , 2018, ICML.
[37] Richard L. Lewis,et al. Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.
[38] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[39] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[40] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[41] Karol Gregor. Finding online neural update rules by learning to remember , 2020, ArXiv.
[42] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[43] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[44] Jan Feyereisl,et al. BADGER: Learning to (Learn [Learning Algorithms] through Multi-Agent Communication) , 2019, ArXiv.
[45] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.