Learning to Reach Goals via Iterated Supervised Learning.
暂无分享,去创建一个
Abhishek Gupta | Justin Fu | Sergey Levine | Coline Devin | Benjamin Eysenbach | Dibya Ghosh | Ashwin Reddy
[1] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[2] Filipe Wall Mutz,et al. Hindsight policy gradients , 2017, ICLR.
[3] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[4] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[5] Leslie Pack Kaelbling,et al. Learning to Achieve Goals , 1993, IJCAI.
[6] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..
[7] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.
[8] Mohamed Medhat Gaber,et al. Imitation Learning , 2017, ACM Comput. Surv..
[9] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[10] Matteo Hessel,et al. Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.
[11] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[12] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.
[13] Dale Schuurmans,et al. Improving Policy Gradient by Exploring Under-appreciated Rewards , 2016, ICLR.
[14] Sergey Levine,et al. Search on the Replay Buffer: Bridging Planning and Reinforcement Learning , 2019, NeurIPS.
[15] Jan Peters,et al. Fitted Q-iteration by Advantage Weighted Regression , 2008, NIPS.
[16] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[17] Sergey Levine,et al. Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.
[18] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[19] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[20] Sergey Levine,et al. Learning Actionable Representations with Goal-Conditioned Policies , 2018, ICLR.
[21] Stefan Schaal,et al. Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.
[22] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[23] Sergey Levine,et al. Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.
[24] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.
[25] Satinder Singh,et al. Self-Imitation Learning , 2018, ICML.
[26] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.
[27] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[28] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[29] Sergey Levine,et al. Learning Latent Plans from Play , 2019, CoRL.
[30] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[31] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[32] Dale Schuurmans,et al. Reward Augmented Maximum Likelihood for Neural Structured Prediction , 2016, NIPS.
[33] Shakir Mohamed,et al. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.
[34] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[35] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[36] Yaoliang Yu,et al. Distributional Reinforcement Learning for Efficient Exploration , 2019, ICML.
[37] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[38] Henry Zhu,et al. ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots , 2019, CoRL.
[39] Vladlen Koltun,et al. Semi-parametric Topological Memory for Navigation , 2018, ICLR.
[40] Prabhat Nagarajan,et al. Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.
[41] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .
[42] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.
[43] Jitendra Malik,et al. Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[44] Pieter Abbeel,et al. Goal-conditioned Imitation Learning , 2019, NeurIPS.
[45] Sergey Levine,et al. Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning , 2019, CoRL.
[46] Michael L. Littman,et al. The Cross-Entropy Method Optimizes for Quantiles , 2013, ICML.