论文信息 - Asymptotic analysis of value prediction by well-specified and misspecified models

Asymptotic analysis of value prediction by well-specified and misspecified models

One of the important theoretical issues in reinforcement learning is to rigorously know the statistical properties of various value estimators. This study aims to theoretically examine the prediction error of the value estimator whose estimated value is represented by a linear function with respect to a parameter. We extend the framework of semiparametric statistics inference introduced by to make it applicable to analysis of mean squared error (MSE) between the true value and the predicted value. This analysis allows us to investigate and compare the statistical prediction error of value estimators when the model is misspecified, i.e., the value estimator cannot represent the true value irrelevant to the parameter. We confirm our theoretical analysis by using a toy problem.

Shin Ishii | Tsuyoshi Ueno | Shin-ichi Maeda

[1] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .

[2] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[3] Klaus Obermayer,et al. The optimal unbiased value estimator and its relation to LSTD, TD and MC , 2010, Machine Learning.

[4] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[5] G. Kitagawa,et al. Generalised information criteria in model selection , 1996 .

[6] Michael I. Jordan,et al. An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators , 2008, ICML '08.

[7] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[8] Motoaki Kawanabe,et al. Generalized TD Learning , 2011, J. Mach. Learn. Res..

[9] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[10] H. Akaike. A new look at the statistical model identification , 1974 .

[11] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.