State-Dependent Exploration for Policy Gradient Methods
暂无分享,去创建一个
Jürgen Schmidhuber | Martin Felder | Thomas Rückstieß | J. Schmidhuber | Thomas Rückstieß | M. Felder
[1] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[2] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[3] J. Spall. Implementation of the simultaneous perturbation algorithm for stochastic optimization , 1998 .
[4] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[5] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[6] Marco Wiering,et al. Explorations in efficient reinforcement learning , 1999 .
[7] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[8] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[9] Matthew Saffell,et al. Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.
[10] Leonid Peshkin,et al. Reinforcement learning for adaptive routing , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).
[11] Douglas Aberdeen,et al. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .
[12] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[13] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[16] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[17] Robert B. Fisher,et al. Incremental One-Class Learning with Bounded Computational Complexity , 2007, ICANN.
[18] Jürgen Schmidhuber,et al. Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.