Learning to Reach Goals via Iterated Supervised Learning
暂无分享,去创建一个
Sergey Levine | Justin Fu | Coline Devin | Benjamin Eysenbach | Dibya Ghosh | Ashwin Reddy | Abhishek Gupta | S. Levine | Abhishek Gupta | Benjamin Eysenbach | Justin Fu | Dibya Ghosh | Coline Devin | Ashwin Reddy
[1] Pieter Abbeel,et al. Goal-conditioned Imitation Learning , 2019, NeurIPS.
[2] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[3] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.
[4] Sergey Levine,et al. Search on the Replay Buffer: Bridging Planning and Reinforcement Learning , 2019, NeurIPS.
[5] Stefan Schaal,et al. Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.
[6] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[7] Leslie Pack Kaelbling,et al. Learning to Achieve Goals , 1993, IJCAI.
[8] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[9] Sergey Levine,et al. Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.
[10] Jan Peters,et al. Fitted Q-iteration by Advantage Weighted Regression , 2008, NIPS.
[11] Henry Zhu,et al. ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots , 2019, CoRL.
[12] Sergey Levine,et al. Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning , 2019, CoRL.
[13] Sergey Levine,et al. Learning Latent Plans from Play , 2019, CoRL.
[14] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[15] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[16] Prabhat Nagarajan,et al. Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.
[17] Sergey Levine,et al. Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.
[18] Yaoliang Yu,et al. Distributional Reinforcement Learning for Efficient Exploration , 2019, ICML.
[19] Dale Schuurmans,et al. Improving Policy Gradient by Exploring Under-appreciated Rewards , 2016, ICLR.
[20] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[21] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[22] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[23] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[24] Jianye Hao,et al. Independent Generative Adversarial Self-Imitation Learning in Cooperative Multiagent Systems , 2019, AAMAS.
[25] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..
[26] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[27] Henk Nijmeijer,et al. Robot Programming by Demonstration , 2010, SIMPAR.
[28] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.
[29] Vladlen Koltun,et al. Semi-parametric Topological Memory for Navigation , 2018, ICLR.
[30] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[31] Jitendra Malik,et al. Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[32] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.
[33] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[34] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[35] Filipe Wall Mutz,et al. Hindsight policy gradients , 2017, ICLR.
[36] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[37] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.
[38] Dale Schuurmans,et al. Reward Augmented Maximum Likelihood for Neural Structured Prediction , 2016, NIPS.
[39] Matteo Hessel,et al. Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.
[40] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .
[41] Shakir Mohamed,et al. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.
[42] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.
[43] Michael L. Littman,et al. The Cross-Entropy Method Optimizes for Quantiles , 2013, ICML.
[44] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[45] Satinder Singh,et al. Self-Imitation Learning , 2018, ICML.
[46] Sergey Levine,et al. Learning Actionable Representations with Goal-Conditioned Policies , 2018, ICLR.
[47] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.