A Natural Policy Gradient
暂无分享,去创建一个
[1] C. J.,et al. Maximum Likelihood and Covariant Algorithms for Independent Component Analysis , 1996 .
[2] Michael I. Jordan,et al. On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.
[3] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[4] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.
[5] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[6] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[7] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[8] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[9] Geoffrey E. Hinton,et al. Using EM for Reinforcement Learning , 2000 .
[10] J. Baxter,et al. Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).
[11] Sham M. Kakade,et al. Optimizing Average Reward Using Discounted Rewards , 2001, COLT/EuroCOLT.