Policy Gradient Reinforcement Learning with Environmental Dynamics and Action-Values in Policies
暂无分享,去创建一个
[1] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.
[2] Harukazu Igarashi,et al. Behavior Learning Based on a Policy Gradient Method: Separation of Environmental Dynamics and State Values in Policies , 2008, PRICAI.
[3] Harukazu Igarashi,et al. Applying the policy gradient method to behavior learning in multiagent systems: The pursuit problem , 2006 .
[4] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[5] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[6] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[7] Shigenobu Kobayashi,et al. Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward , 1995, ICML.
[8] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[9] Masaomi Kimura,et al. Reinforcement Learning in Non-Markov Decision Processes: Statistical Properties of Characteristic Eligibility , 2008 .
[10] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.
[11] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[12] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.