Relative Entropy Policy Search
暂无分享,去创建一个
Yasemin Altun | Jan Peters | Katharina Mülling | Jan Peters | Y. Altun | Katharina Muelling | Jan Peters March
[1] Christopher G. Atkeson,et al. Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.
[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[3] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[4] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[5] Richard S. Sutton,et al. Dimensions of Reinforcement Learning , 1998 .
[6] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[7] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[8] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[9] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.
[10] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[11] Shie Mannor,et al. Biases and Variance in Value Function Estimates , 2004 .
[12] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[13] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[14] Sugiyama Masashi,et al. Adaptive Importance Sampling with Automatic Model Selection in Value Function Approximation , 2007 .
[15] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.
[16] David Barber,et al. Variational methods for Reinforcement Learning , 2010, AISTATS.
[17] Marc Peter Deisenroth,et al. Efficient reinforcement learning using Gaussian processes , 2010 .