Bias and variance in value function estimation
暂无分享,去创建一个
John N. Tsitsiklis | Shie Mannor | Peng Sun | Duncan Simester | Shie Mannor | J. Tsitsiklis | D. Simester | Peng Sun
[1] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[2] Anne Lohrli. Chapman and Hall , 1985 .
[3] Jerzy A. Filar,et al. Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..
[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[5] J. R. Bult,et al. Optimal Selection for Direct Mail , 1995 .
[6] Susana V. Mondschein,et al. Mailing Decisions in the Catalog Sales Industry , 1996 .
[7] Füsun F. Gönül,et al. Optimal Mailing of Catalogs: a New Methodology Using Estimable Structural Dynamic Programming Models , 1998 .
[8] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[9] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[10] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[11] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.