Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices
暂无分享,去创建一个
Chelsea Finn | Aditi Raghunathan | Percy Liang | Evan Zheran Liu | Percy Liang | Chelsea Finn | E. Liu | Aditi Raghunathan
[1] Yee Whye Teh,et al. Meta reinforcement learning as task inference , 2019, ArXiv.
[2] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[3] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[4] C A Nelson,et al. Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.
[5] Kevin Swersky,et al. An Imitation Learning Approach for Cache Replacement , 2020, ICML.
[6] Changjie Fan,et al. Learn to Effectively Explore in Context-Based Meta-RL , 2020, ArXiv.
[7] Dale Schuurmans,et al. Learning to Generalize from Sparse and Underspecified Rewards , 2019, ICML.
[8] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[9] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[10] Sergey Levine,et al. Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.
[11] Sergey Levine,et al. Watch, Try, Learn: Meta-Learning from Demonstrations and Reward , 2019, ICLR.
[12] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[13] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[14] Katja Hofmann,et al. Meta Reinforcement Learning with Latent Variable Gaussian Processes , 2018, UAI.
[15] Atil Iscen,et al. NoRML: No-Reward Meta Learning , 2019, AAMAS.
[16] Pieter Abbeel,et al. Evolved Policy Gradients , 2018, NeurIPS.
[17] S. Levine,et al. Guided Meta-Policy Search , 2019, NeurIPS.
[18] Percy Liang,et al. Learning Abstract Models for Strategic Exploration and Fast Reward Transfer , 2020, ArXiv.
[19] Pieter Abbeel,et al. The Importance of Sampling inMeta-Reinforcement Learning , 2018, NeurIPS.
[20] Yoshua Bengio,et al. Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.
[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[22] Richard J. Mammone,et al. Meta-neural networks that learn by learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.
[23] Aviv Tamar,et al. Offline Meta Reinforcement Learning , 2020, ArXiv.
[24] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[25] David Barber,et al. The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.
[26] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[27] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[28] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[29] David Warde-Farley,et al. Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.
[30] Katia Sycara,et al. MAME : Model-Agnostic Meta-Exploration , 2019, CoRL.
[31] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[32] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.
[33] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.
[34] Shimon Whiteson,et al. VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning , 2020, ICLR.
[35] Alexander A. Alemi,et al. Deep Variational Information Bottleneck , 2017, ICLR.
[36] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[37] Sergey Levine,et al. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.
[38] Jordi Grau-Moya,et al. A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment , 2019, NeurIPS.
[39] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[40] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[41] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[42] Daan Wierstra,et al. One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.
[43] Sergey Levine,et al. Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.
[44] Voot Tangkaratt,et al. Meta-Model-Based Meta-Policy Optimization , 2020, ArXiv.
[45] Sergey Levine,et al. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.
[46] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[47] Ludovic Denoyer,et al. Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization , 2020, ArXiv.
[48] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[49] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.
[50] Yoshua Bengio,et al. On the Optimization of a Synaptic Learning Rule , 2007 .
[51] Abhinav Gupta,et al. Environment Probing Interaction Policies , 2019, ICLR.
[52] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .