On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient
暂无分享,去创建一个
[1] Peter W. Glynn,et al. Likelilood ratio gradient estimation: an overview , 1987, WSC '87.
[2] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[3] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[4] Jun S. Liu,et al. Sequential Imputations and Bayesian Missing Data Problems , 1994 .
[5] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .
[6] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[7] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[8] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[9] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[10] J. Baxter,et al. Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).
[11] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[12] Jun S. Liu,et al. Monte Carlo strategies in scientific computing , 2001 .
[13] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[14] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[15] Shin Ishii,et al. Reinforcement Learning for CPG-Driven Biped Robot , 2004, AAAI.
[16] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[17] Xi-Ren Cao,et al. A basic formula for online policy gradient algorithms , 2005, IEEE Transactions on Automatic Control.
[18] H. Sebastian Seung,et al. Learning to Walk in 20 Minutes , 2005 .
[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[20] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[21] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[22] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[23] Andrew Y. Ng,et al. Learning omnidirectional path following using dimensionality reduction , 2007, Robotics: Science and Systems.
[24] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.
[25] Martin A. Riedmiller,et al. Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[26] C. Stachniss,et al. Learning Omnidirectional Path Following Using Dimensionality Reduction , 2008 .
[27] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.
[28] P. Glynn. LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .
[29] Machine Learning of Motor Skills for Robotics, Jan Peters , 2022 .