Online exploration in least-squares policy iteration
暂无分享,去创建一个
[1] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[2] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[3] R. Simmons,et al. The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms , 2004, Machine Learning.
[4] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[5] Jon Louis Bentley,et al. An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.
[6] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[7] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[10] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[11] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[12] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[13] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[14] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[15] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.
[16] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[18] Sebastian Thrun,et al. The role of exploration in learning control , 1992 .
[19] J. Tsitsiklis,et al. An optimal one-way multigrid algorithm for discrete-time stochastic control , 1991 .
[20] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[21] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.
[22] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[23] Csaba Szepesvari,et al. Learning near-optimal policies with fitted policy iteration and a single sample path , 2005 .
[24] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[25] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[26] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[27] Michael L. Littman,et al. Multi-resolution Exploration in Continuous Spaces , 2008, NIPS.