Discovery of Useful Questions as Auxiliary Tasks
暂无分享,去创建一个
Richard L. Lewis | Junhyuk Oh | Satinder Singh | Matteo Hessel | Janarthanan Rajendran | Richard Lewis | Hado van Hasselt | Zhongwen Xu | Vivek Veeriah | David Silver | Junhyuk Oh | Satinder Singh | Matteo Hessel | H. V. Hasselt | Zhongwen Xu | Vivek Veeriah | Janarthanan Rajendran | David Silver
[1] Wojciech Czarnecki,et al. Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.
[2] Andreas Griewank,et al. Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.
[3] Jitendra Malik,et al. Learning to Optimize , 2016, ICLR.
[4] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[5] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[6] Nicolas Le Roux,et al. A Geometric Perspective on Optimal Representations for Reinforcement Learning , 2019, NeurIPS.
[7] David Silver,et al. On Inductive Biases in Deep Reinforcement Learning , 2019, ArXiv.
[8] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.
[9] Matteo Hessel,et al. General non-linear Bellman equations , 2019, ArXiv.
[10] Misha Denil,et al. Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[13] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[14] Richard S. Sutton,et al. Temporal-Difference Networks , 2004, NIPS.
[15] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.
[16] Trevor Darrell,et al. Loss is its own Reward: Self-Supervision for Reinforcement Learning , 2016, ICLR.
[17] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[18] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[19] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[20] Pieter Abbeel,et al. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.
[21] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.
[22] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[23] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[24] Sergey Levine,et al. Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.
[25] Takaki Makino,et al. On-line discovery of temporal-difference networks , 2008, ICML '08.
[26] Tom Schaul,et al. Unicorn: Continual Learning with a Universal, Off-policy Agent , 2018, ArXiv.
[27] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[28] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[29] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[30] Misha Denil,et al. Learning to Learn for Global Optimization of Black Box Functions , 2016, ArXiv.
[31] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.
[32] Pieter Abbeel,et al. Some Considerations on Learning to Explore via Meta-Reinforcement Learning , 2018, ICLR 2018.
[33] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[34] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.
[35] Satinder Singh,et al. Many-Goals Reinforcement Learning , 2018, ArXiv.
[36] Sebastian Thrun,et al. Learning to Learn: Introduction and Overview , 1998, Learning to Learn.
[37] Jieyu Zhao,et al. Simple Principles of Metalearning , 1996 .
[38] Sergey Levine,et al. Unsupervised Learning via Meta-Learning , 2018, ICLR.
[39] Martha White,et al. Discovery of Predictive Representations With a Network of General Value Functions , 2018 .
[40] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[41] Sergey Levine,et al. Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.
[42] Marcin Andrychowicz,et al. One-Shot Imitation Learning , 2017, NIPS.
[43] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[44] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[45] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[46] Adam M White,et al. DEVELOPING A PREDICTIVE APPROACH TO KNOWLEDGE , 2015 .
[47] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[48] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[49] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[50] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[51] Michael R. James,et al. Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.
[52] Misha Denil,et al. Learned Optimizers that Scale and Generalize , 2017, ICML.