Effective Linear Policy Gradient Search through Primal-Dual Approximation
暂无分享,去创建一个
[1] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[2] Yuval Tassa,et al. Infinite-Horizon Model Predictive Control for Periodic Tasks with Contacts , 2011, Robotics: Science and Systems.
[3] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[4] Anil A. Bharath,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.
[5] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[6] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[7] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[8] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[9] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[10] V. Borkar. Stochastic approximation with two time scales , 1997 .
[11] Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.
[12] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[13] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..
[14] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[15] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[16] Mohammad Ghavamzadeh,et al. Variance-constrained actor-critic algorithms for discounted and average reward MDPs , 2014, Machine Learning.
[17] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.
[18] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[19] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..
[20] Gregory Dudek,et al. Benchmark Environments for Multitask Learning in Continuous Domains , 2017, ArXiv.
[21] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.
[22] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[23] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[24] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[25] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[26] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.