Finite-sample analysis of least-squares policy iteration
暂无分享,去创建一个
[1] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[2] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[3] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[4] Bin Yu. RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .
[5] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[6] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[8] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[9] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[10] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[11] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[12] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[13] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[14] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[15] Ron Meir,et al. Nonparametric Time Series Prediction Through Adaptive Model Selection , 2000, Machine Learning.
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] M. Talagrand. The Generic chaining : upper and lower bounds of stochastic processes , 2005 .
[18] T. Lai,et al. Pseudo-maximization and self-normalized processes , 2007, 0709.2233.
[19] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[20] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[21] V. Peña,et al. Exponential inequalities for self-normalized processes with applications , 2009 .
[22] Bruno Scherrer,et al. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.
[23] Dimitri P. Bertsekas,et al. Error Bounds for Approximations from Projected Linear Equations , 2010, Math. Oper. Res..
[24] S. Delattre,et al. Nonparametric regression with martingale increment errors , 2010, 1010.6209.
[25] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[26] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[27] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[28] Alessandro Lazaric,et al. LSTD with Random Projections , 2010, NIPS.
[29] Matthew W. Hoffman,et al. Finite-Sample Analysis of Lasso-TD , 2011, ICML.
[30] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[31] Bruno Scherrer,et al. Classification-based Policy Iteration with a Critic , 2011, ICML.
[32] Sham M. Kakade,et al. Random Design Analysis of Ridge Regression , 2012, COLT.
[33] Csaba Szepesvári,et al. Statistical linear estimation with penalized estimators: an application to reinforcement learning , 2012, ICML.