GeometricVarianc eReductioninMark ovChains. ApplicationtoVal ueFunctionandGra dientEstimation
暂无分享,去创建一个
[1] J. Hammersley,et al. Monte Carlo Methods , 1965 .
[2] J. Halton. A Retrospective and Prospective Survey of the Monte Carlo Method , 1970 .
[3] Alan Weiss,et al. Sensitivity analysis via likelihood ratios , 1986, WSC '86.
[4] Peter W. Glynn,et al. Likelilood ratio gradient estimation: an overview , 1987, WSC '87.
[5] J. Halton. Sequential monte carlo techniques for the solution of linear systems , 1994 .
[6] Dennis D. Cox,et al. Adaptive importance sampling on discrete Markov chains , 1999 .
[7] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[8] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[9] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .
[10] John N. Tsitsiklis,et al. Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes , 2003, Discret. Event Dyn. Syst..
[11] S. Maire. An iterative computation of approximations on Korobov-like spaces , 2003 .
[12] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[13] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[14] Andrew W. Moore,et al. Locally Weighted Learning , 1997, Artificial Intelligence Review.
[15] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[16] Sylvain Maire,et al. Sequential Control Variates for Functionals of Markov Processes , 2005, SIAM J. Numer. Anal..