Signal-to-Noise Ratio Analysis of Policy Gradient Algorithms
暂无分享,去创建一个
[1] Marwan A. Jabri,et al. Weight Perturbation: An Optimal Architecture and Learning Technique for Analog VLSI Feedforward and Recurrent Multilayer Networks , 1991, Neural Comput..
[2] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[3] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[4] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[5] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[6] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[7] Stefan Schaal,et al. Policy Gradient Methods for Robot Control , 2003 .
[8] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[9] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[10] H. Sebastian Seung,et al. Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).
[11] Jun Zhang,et al. Symmetry breaking leads to forward flapping flight , 2004, Journal of Fluid Mechanics.
[12] Jun Zhang,et al. Heavy flags undergo spontaneous oscillations in flowing water. , 2005, Physical review letters.
[13] A. Willsky,et al. Importance sampling actor-critic algorithms , 2006, 2006 American Control Conference.
[14] Leslie Pack Kaelbling,et al. Off-Policy Policy Search , 2007 .
[15] Martin A. Riedmiller,et al. Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.