论文信息 - Bias and variance in value function estimation

Bias and variance in value function estimation

We consider the bias and variance of value function estimation that are caused by using an empirical model instead of the true model. We analyze these bias and variance for Markov processes from a classical (frequentist) statistical point of view, and in a Bayesian setting. Using a second order approximation, we provide explicit expressions for the bias and variance in terms of the transition counts and the reward statistics. We present supporting experiments with artificial Markov chains and with a large transactional database provided by a mail-order catalog firm.

[1] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .

[2] Anne Lohrli. Chapman and Hall , 1985 .

[3] Jerzy A. Filar,et al. Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..

[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5] J. R. Bult,et al. Optimal Selection for Direct Mail , 1995 .

[6] Susana V. Mondschein,et al. Mailing Decisions in the Catalog Sales Industry , 1996 .

[7] Füsun F. Gönül,et al. Optimal Mailing of Catalogs: a New Methodology Using Estimable Structural Dynamic Programming Models , 1998 .

[8] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.

[9] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[10] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[11] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.