Implicit incremental natural actor critic algorithm
暂无分享,去创建一个
[1] Jin Yu,et al. Natural Actor-Critic for Road Traffic Optimisation , 2006, NIPS.
[2] Luca Bascetta,et al. Adaptive Step-Size for Policy Gradient Methods , 2013, NIPS.
[3] Edoardo M. Airoldi,et al. Scalable estimation strategies based on stochastic approximations: classical results and new insights , 2015, Statistics and Computing.
[4] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[5] V. Borkar. Stochastic approximation with two time scales , 1997 .
[6] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[7] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[8] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[9] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[10] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[11] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[12] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[13] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[14] Isao Ono,et al. Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks , 2010, NIPS.
[15] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[16] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[17] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.