Adaptive Step-Size for Policy Gradient Methods
暂无分享,去创建一个
Luca Bascetta | Marcello Restelli | Matteo Pirotta | Matteo Pirotta | Marcello Restelli | L. Bascetta
[1] Gang Niu,et al. Analysis and Improvement of Policy Gradient Estimation , 2011, NIPS.
[2] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[3] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[4] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[5] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .
[6] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[7] David J. Thuente,et al. Line search algorithms with guaranteed sufficient decrease , 1994, TOMS.
[8] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[9] Paul Wagner,et al. A reinterpretation of the policy oscillation phenomenon in approximate policy iteration , 2011, NIPS.
[10] Marc Toussaint,et al. Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.
[11] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.
[12] H. Robbins. A Stochastic Approximation Method , 1951 .
[13] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[14] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .
[15] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[16] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[17] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.