Exploiting Multiple Secondary Reinforcers in Policy Gradient Reinforcement Learning
暂无分享,去创建一个
[1] Marvin Minsky,et al. Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.
[2] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[3] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[4] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[5] L. Ungar,et al. Localizing Policy Gradient Estimates to Action Transitions , 2000 .
[6] William H. Press,et al. Numerical recipes in C , 2002 .
[7] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[8] William H. Press,et al. Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .
[9] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.
[10] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[11] Gregory Z. Grudic,et al. Localizing Search in Reinforcement Learning , 2000, AAAI/IAAI.
[12] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[13] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[14] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .