Asymptotic analysis of value prediction by well-specified and misspecified models

One of the important theoretical issues in reinforcement learning is to rigorously know the statistical properties of various value estimators. This study aims to theoretically examine the prediction error of the value estimator whose estimated value is represented by a linear function with respect to a parameter. We extend the framework of semiparametric statistics inference introduced by to make it applicable to analysis of mean squared error (MSE) between the true value and the predicted value. This analysis allows us to investigate and compare the statistical prediction error of value estimators when the model is misspecified, i.e., the value estimator cannot represent the true value irrelevant to the parameter. We confirm our theoretical analysis by using a toy problem.