Model-Free Least-Squares Policy Iteration
暂无分享,去创建一个
[1] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[2] Kazuo Tanaka,et al. An approach to fuzzy control of nonlinear systems: stability and design issues , 1996, IEEE Trans. Fuzzy Syst..
[3] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[4] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[5] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[6] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[7] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[8] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[9] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[10] Andrew Y. Ng,et al. Policy Search via Density Estimation , 1999, NIPS.
[11] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.
[12] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[13] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[14] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.