Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
暂无分享,去创建一个
Richard E. Turner | Zoubin Ghahramani | Bernhard Scholkopf | Timothy Lillicrap | Shixiang Gu | Sergey Levine | S. Levine | S. Gu | T. Lillicrap | B. Schölkopf | Zoubin Ghahramani | B. Scholkopf
[1] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[2] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[3] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[4] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[5] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[6] Koray Kavukcuoglu,et al. Combining policy gradient and Q-learning , 2016, ICLR.
[7] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[8] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[9] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[10] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[11] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[12] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.
[13] Pieter Abbeel,et al. On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient , 2010, NIPS.
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[16] D. Hunter,et al. A Tutorial on MM Algorithms , 2004 .
[17] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[18] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[19] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[20] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[21] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[22] Sergey Levine,et al. PLATO: Policy learning using adaptive trajectory optimization , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[23] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[24] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[25] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[26] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[27] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[28] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[29] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[30] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[31] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[32] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[33] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[34] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[35] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.