An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
暂无分享,去创建一个
Lihong Li | Michael L. Littman | Gavin Taylor | Ronald Parr | Christopher Painter-Wakefield | Ronald E. Parr | M. Littman | Lihong Li | Gavin Taylor | Christopher Painter-Wakefield
[1] Stéphane Mallat,et al. Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..
[2] Robert Givan,et al. Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.
[3] Daphne Koller,et al. Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.
[4] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[5] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[6] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[7] Robert Givan,et al. Feature-Discovering Approximate Value Iteration Methods , 2005, SARA.
[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[9] Scott Sanner,et al. Approximate Linear Programming for First-order MDPs , 2005, UAI.
[10] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[11] Shie Mannor,et al. Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.
[12] Marek Petrik,et al. An Analysis of Laplacian Methods for Value Function Approximation in MDPs , 2007, IJCAI.
[13] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.
[14] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..
[15] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.