Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark
暂无分享,去创建一个
[1] V. Gullapalli,et al. Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.
[2] Martin A. Riedmiller,et al. Advanced supervised learning in multi-layer perceptrons — From backpropagation to adaptive learning algorithms , 1994 .
[3] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[4] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[5] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[6] Michael C. Fu,et al. Feature Article: Optimization for simulation: Theory vs. Practice , 2002, INFORMS J. Comput..
[7] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[8] H. Sebastian Seung,et al. Learning to Walk in 20 Minutes , 2005 .
[9] Jun Morimoto,et al. Learning CPG Sensory Feedback with Policy Gradient for Biped Locomotion for a Full-Body Humanoid , 2005, AAAI.
[10] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[11] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.