论文信息 - Biases and Variance in Value Function Estimates - 字舞流文

Biases and Variance in Value Function Estimates

We consider a Markov Decision Process and study the bias and variance in the value function estimates that result from empirical estimates of the model parameters. We provide closed-form approximations for the bias and variance, which can then be used to derive confidence intervals around the value function estimates. We illustrate and validate our findings using a large database describing the transaction and mailing histories for customers of a mail-order catalog firm.

Shie Mannor | J. Tsitsiklis | D. Simester | Peng Sun | Peng Sun

[1] C. E. Clark. The Greatest of a Finite Set of Random Variables , 1961 .

[2] J. Cockcroft. Investment in Science , 1962, Nature.

[3] R. Bellman. Dynamic programming. , 1957, Science.

[4] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .

[5] Anne Lohrli. Chapman and Hall , 1985 .

[6] John Rust. Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher , 1987 .

[7] Jerzy A. Filar,et al. Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..

[8] Keith W. Ross,et al. Variability Sensitive Markov Decision Processes , 1992, Math. Oper. Res..

[9] Kenneth I. Wolpin,et al. The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpol , 1994 .

[10] Eduardo S. Schwartz,et al. Investment Under Uncertainty. , 1994 .

[11] J. R. Bult,et al. Optimal Selection for Direct Mail , 1995 .

[12] David B. Dunson,et al. Bayesian Data Analysis , 2010 .

[13] Susana V. Mondschein,et al. Mailing Decisions in the Catalog Sales Industry , 1996 .

[14] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[15] Füsun F. Gönül,et al. Optimal Mailing of Catalogs: a New Methodology Using Estimable Structural Dynamic Programming Models , 1998 .

[16] Jeffrey I. McGill,et al. Revenue Management: Research Overview and Prospects , 1999, Transp. Sci..

[17] N. Barberis. Investing for the Long Run When Returns are Predictable , 2000 .

[18] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[19] Paul H. Zipkin,et al. Foundations of Inventory Management , 2000 .

[20] Yihong Xia. Learning About Predictability: The Effects of Parameter Uncertainty on Dynamic Asset Allocation , 2000 .

[21] Jay H. Lee,et al. Neuro-dynamic programming method for MPC 1 , 2001 .

[22] Luis M. Viceira,et al. Appendix for "Strategic Asset Allocation: Portfolio Choice for Long-Term Investors" , 2001 .

[23] I. Hendel,et al. Measuring the Implications of Sales and Consumer Stockpiling Behavior , 2002 .

[24] Christian Schlag. Strategic Asset Allocation: Portfolio Choice for Long‐Term Investors. , 2003 .

[25] John N. Tsitsiklis,et al. Dynamic Catalog Mailing Policies , 2006, Manag. Sci..

[26] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.