Zeroth-Order Supervised Policy Improvement
暂无分享,去创建一个
Bolei Zhou | Bo Dai | J. Xiong | Yuhang Song | Hao Sun | Ziping Xu | Meng Fang | Zhengyou Zhang | Jiechao Xiong | Hao Sun
[1] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[2] Robert Loftin,et al. Better Exploration with Optimistic Actor-Critic , 2019, NeurIPS.
[3] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[4] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[5] Sergey Levine,et al. Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.
[6] Kenneth O. Stanley,et al. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.
[7] Yurii Nesterov,et al. Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.
[8] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[9] Yuanqi Li,et al. Policy Search by Target Distribution Learning for Continuous Control , 2019, AAAI.
[10] Vaneet Aggarwal,et al. Escaping Saddle Points for Zeroth-order Non-convex Optimization using Estimated Gradient Descent , 2019, 2020 54th Annual Conference on Information Sciences and Systems (CISS).
[11] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.
[12] Nicolas Usunier,et al. Episodic Exploration for Deep Deterministic Policies for StarCraft Micromanagement , 2016, ICLR.
[13] Ying Fan,et al. Efficient Model-Free Reinforcement Learning Using Gaussian Process , 2018, ArXiv.
[14] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[15] Shimon Whiteson,et al. Generalized Off-Policy Actor-Critic , 2019, NeurIPS.
[16] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[17] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[18] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[19] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[20] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[21] Georgios Piliouras,et al. Efficiently avoiding saddle points with zero order methods: No gradients required , 2019, NeurIPS.
[22] Yuval Tassa,et al. Relative Entropy Regularized Policy Iteration , 2018, ArXiv.
[23] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[24] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[25] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[26] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[27] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[28] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[29] Sergey Levine,et al. Learning to Reach Goals via Iterated Supervised Learning , 2019, ICLR.
[30] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[31] Jonathan P. How,et al. Sample Efficient Reinforcement Learning with Gaussian Processes , 2014, ICML.
[32] D. Golovin,et al. Gradientless Descent: High-Dimensional Zeroth-Order Optimization , 2019, ICLR.
[33] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[34] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[35] H. Sebastian Seung,et al. Q-Learning for Continuous Actions with Cross-Entropy Guided Policies , 2019, ArXiv.
[36] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.
[37] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.
[38] Sivaraman Balakrishnan,et al. Stochastic Zeroth-order Optimization in High Dimensions , 2017, AISTATS.
[39] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[40] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[41] Distributional Policy Optimization: An Alternative Approach for Continuous Control , 2019, NeurIPS.
[42] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[43] Xiaotong Liu,et al. Policy Continuation with Hindsight Inverse Dynamics , 2019, NeurIPS.
[44] Qing Wang,et al. Exponentially Weighted Imitation Learning for Batched Historical Data , 2018, NeurIPS.
[45] Dale Schuurmans,et al. Striving for Simplicity in Off-policy Deep Reinforcement Learning , 2019, ArXiv.
[46] Malte Kuß,et al. Gaussian process models for robust regression, classification, and reinforcement learning , 2006 .
[47] S. Srihari. Mixture Density Networks , 1994 .
[48] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[49] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[50] H. Francis Song,et al. V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control , 2019, ICLR.
[51] Sungsu Lim,et al. Actor-Expert: A Framework for using Q-learning in Continuous Action Spaces , 2018 .
[52] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[53] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[54] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[55] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[56] S. Levine,et al. Learning To Reach Goals Without Reinforcement Learning , 2019, ArXiv.