[1] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[2] Wolfram Burgard,et al. Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration , 2019, ArXiv.
[3] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[4] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[5] Yuan Zhou,et al. Exploration via Hindsight Goal Generation , 2019, NeurIPS.
[6] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[7] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[8] Hussein A. Abbass,et al. Hierarchical Deep Reinforcement Learning for Continuous Action Control , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[9] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[10] Honglak Lee,et al. Contingency-Aware Exploration in Reinforcement Learning , 2018, ICLR.
[11] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.
[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[13] Deepak Pathak,et al. Self-Supervised Exploration via Disagreement , 2019, ICML.
[14] Daniel Guo,et al. Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.
[15] Pieter Abbeel,et al. Planning to Explore via Self-Supervised World Models , 2020, ICML.
[16] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[17] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[18] Lucas Beyer,et al. MULEX: Disentangling Exploitation from Exploration in Deep RL , 2019, ArXiv.
[19] Rui Zhao,et al. Maximum Entropy-Regularized Multi-Goal Reinforcement Learning , 2019, ICML.
[20] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.
[21] Marc Pollefeys,et al. Episodic Curiosity through Reachability , 2018, ICLR.
[22] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[23] Daniel Guo,et al. Never Give Up: Learning Directed Exploration Strategies , 2020, ICLR.
[24] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[25] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[26] Daoyi Dong,et al. Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[27] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.
[28] Wojciech Jaskowski,et al. Model-Based Active Exploration , 2018, ICML.
[29] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[30] Robert Loftin,et al. Better Exploration with Optimistic Actor-Critic , 2019, NeurIPS.
[31] Sergey Levine,et al. EMI: Exploration with Mutual Information , 2018, ICML.
[32] Shimon Whiteson,et al. Optimistic Exploration even with a Pessimistic Initialisation , 2020, ICLR.
[33] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[34] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[35] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[36] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[37] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[38] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[39] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[40] Dmitry Vetrov,et al. Variational Autoencoder with Arbitrary Conditioning , 2018, ICLR.
[41] Zhang-Wei Hong,et al. Diversity-Driven Exploration Strategy for Deep Reinforcement Learning , 2018, NeurIPS.
[42] Greg Turk,et al. Learning Novel Policies For Tasks , 2019, ICML.
[43] Daniel L. K. Yamins,et al. Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.
[44] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[45] Wilker Aziz,et al. A Stochastic Decoder for Neural Machine Translation , 2018, ACL.
[46] Sergey Levine,et al. DeepMimic , 2018, ACM Trans. Graph..
[47] Junhyuk Oh,et al. What Can Learned Intrinsic Rewards Capture? , 2019, ICML.
[48] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[49] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..
[50] E. Deci,et al. Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.
[51] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[52] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[53] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..