Asymptotic analysis of value prediction by well-specified and misspecified models
暂无分享,去创建一个
[1] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[2] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[3] Klaus Obermayer,et al. The optimal unbiased value estimator and its relation to LSTD, TD and MC , 2010, Machine Learning.
[4] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[5] G. Kitagawa,et al. Generalised information criteria in model selection , 1996 .
[6] Michael I. Jordan,et al. An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators , 2008, ICML '08.
[7] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[8] Motoaki Kawanabe,et al. Generalized TD Learning , 2011, J. Mach. Learn. Res..
[9] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[10] H. Akaike. A new look at the statistical model identification , 1974 .
[11] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.