Exploring Restart Distributions.
暂无分享,去创建一个
Vitaly Levdik | Arash Tavakoli | Petar Kormushev | Christopher M. Smith | Riashat Islam | Riashat Islam | Petar Kormushev | Arash Tavakoli | Vitaly Levdik | Christopher M. Smith
[1] Marcin Andrychowicz,et al. Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.
[2] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[3] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[4] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[5] Richard S. Sutton,et al. Planning by Prioritized Sweeping with Small Backups , 2013, ICML.
[6] Satinder Singh,et al. Self-Imitation Learning , 2018, ICML.
[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[8] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[9] Shimon Whiteson,et al. OFFER: Off-Environment Reinforcement Learning , 2017, AAAI.
[10] Kenneth O. Stanley,et al. Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.
[11] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[14] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[15] Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.
[16] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[17] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[18] Yuval Tassa,et al. Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.
[19] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[20] Pieter Abbeel,et al. Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.
[21] Sham M. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[22] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[23] Vitaly Levdik,et al. Time Limits in Reinforcement Learning , 2017, ICML.
[24] Tim Salimans,et al. Learning Montezuma's Revenge from a Single Demonstration , 2018, ArXiv.
[25] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.