Self-Imitation Learning
暂无分享,去创建一个
Satinder Singh | Honglak Lee | Junhyuk Oh | Yijie Guo | Junhyuk Oh | Satinder Singh | Honglak Lee | Yijie Guo | Honglak Lee
[1] Demis Hassabis,et al. Neural Episodic Control , 2017, ICML.
[2] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[3] Joel Z. Leibo,et al. Model-Free Episodic Control , 2016, ArXiv.
[4] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[5] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[6] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[7] Andrew W. Moore,et al. Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping , 1992, NIPS.
[8] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[9] Quoc V. Le,et al. Neural Program Synthesis with Priority Queue Training , 2018, ArXiv.
[10] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[11] Koray Kavukcuoglu,et al. Combining policy gradient and Q-learning , 2016, ICLR.
[12] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[13] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[14] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[15] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[16] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[17] Yang Liu,et al. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening , 2016, ICLR.
[18] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[19] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[20] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[21] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[22] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[23] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[24] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[25] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[26] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[27] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[28] Chen Liang,et al. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision , 2016, ACL.
[29] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[30] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[31] J. Urgen Schmidhuber,et al. Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.
[32] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[33] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[34] Peter Dayan,et al. Hippocampal Contributions to Control: The Third Way , 2007, NIPS.
[35] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[36] Marc G. Bellemare,et al. The Reactor: A Sample-Efficient Actor-Critic Architecture , 2017, ArXiv.
[37] D. Sofge. THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .
[38] Yang Gao,et al. Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.
[39] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[40] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.