Bias in Natural Actor-Critic Algorithms
暂无分享,去创建一个
[1] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[2] D. Bertsekas. Gradient convergence in gradient methods , 1997 .
[3] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.
[4] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[6] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[7] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[8] Michail G. Lagoudakis,et al. Model-Free Least-Squares Policy Iteration , 2001, NIPS.
[9] Sham M. Kakade,et al. Optimizing Average Reward Using Discounted Rewards , 2001, COLT/EuroCOLT.
[10] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[11] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.
[12] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[15] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[16] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[17] Junichiro Yoshimoto,et al. A Generalized Natural Actor-Critic Algorithm , 2009, NIPS.
[18] Jan Peters,et al. Policy Gradient Methods , 2010, Encyclopedia of Machine Learning.
[19] Patrick M. Pilarski,et al. Model-Free reinforcement learning with continuous action in practice , 2012, 2012 American Control Conference (ACC).