Model-based Policy Gradient Reinforcement Learning
暂无分享,去创建一个
[1] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[3] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[4] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[5] Kee-Eung Kim,et al. Approximate Solutions to Factored Markov Decision Processes via Greedy Search in the Space of Finite State Controllers , 2000, AIPS.
[6] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[7] Xin Wang,et al. Batch Value Function Approximation via Support Vectors , 2001, NIPS.
[8] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[9] Andrew Y. Ng,et al. Policy Search via Density Estimation , 1999, NIPS.
[10] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] J. Tsitsiklis,et al. Actor-citic agorithms , 1999, NIPS 1999.
[13] Matthew L. Ginsberg,et al. Limited Discrepancy Search , 1995, IJCAI.
[14] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.
[15] Craig Boutilier,et al. Exploiting Structure in Policy Construction , 1995, IJCAI.
[16] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[17] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.
[18] Douglas Aberdeen,et al. Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.
[19] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[20] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.
[21] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.
[22] Jonathan Baxter,et al. Scaling Internal-State Policy-Gradient Methods for POMDPs , 2002 .