Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration
暂无分享,去创建一个
Jun Morimoto | Masashi Sugiyama | Voot Tangkaratt | Tingting Zhao | Hirotaka Hachiya | J. Morimoto | Masashi Sugiyama | H. Hachiya | Voot Tangkaratt | Tingting Zhao
[1] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[2] Gang Niu,et al. Analysis and Improvement of Policy Gradient Estimation , 2011, NIPS.
[3] Masashi Sugiyama,et al. Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning , 2011, Neural Computation.
[4] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[5] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .
[6] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[7] Kenji Doya,et al. Natural actor-critic with baseline adjustment for variance reduction , 2008, Artificial Life and Robotics.
[8] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[9] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[10] Ronald L. Wasserstein,et al. Monte Carlo: Concepts, Algorithms, and Applications , 1997 .
[11] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[12] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[13] S. Vijayakumar,et al. Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling , 2004 .
[14] Jun Morimoto,et al. CB: A Humanoid Research Platform for Exploring NeuroScience , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.
[15] Pawel Wawrzynski,et al. Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.
[16] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[17] Nicolas Meuleau,et al. Exploration in Gradient-Based Reinforcement Learning , 2001 .
[18] Frank Sehnke,et al. Parameter-exploring policy gradients , 2010, Neural Networks.
[19] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.
[20] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[21] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[22] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[23] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[24] Yishay Mansour,et al. Learning Bounds for Importance Weighting , 2010, NIPS.
[25] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[26] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[27] Jun Morimoto,et al. Adaptive Step-size Policy Gradients with Average Reward Metric , 2010, ACML.
[28] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[29] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .