Efficient Value-Function Approximation via Online Linear Regression
暂无分享,去创建一个
[1] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.
[2] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[3] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[4] Ethan Bernstein. Absolute error bounds for learning linear functions online , 1992, COLT '92.
[5] Peter Auer,et al. An Improved On-line Algorithm for Learning Linear Evaluation Functions , 2000, COLT.
[6] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.
[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[8] Philip M. Long,et al. On-line learning of linear functions , 1991, STOC '91.
[9] Donald A. Sofge,et al. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .
[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[11] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[12] Benjamin Van Roy. Learning and value function approximation in complex decision processes , 1998 .
[13] Ashutosh Kumar Singh,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .
[14] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.
[15] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[16] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[17] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[18] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[19] Hans Ulrich Simon,et al. From noise-free to noise-tolerant and from on-line to batch learning , 1995, COLT '95.
[20] Rica Gonen,et al. An incentive-compatible multi-armed bandit mechanism , 2007, PODC '07.
[21] Vladimir Vovk,et al. Competitive On-line Linear Regression , 1997, NIPS.
[22] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[23] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[24] D. Sofge. THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .
[25] Philip M. Long. On-line evaluation and prediction using linear functions , 1997, COLT '97.
[26] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[27] Reid G. Simmons,et al. The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms , 2005, Machine Learning.
[28] John N. Tsitsiklis,et al. The complexity of dynamic programming , 1989, J. Complex..
[29] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[30] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.
[31] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..
[32] Philip M. Long,et al. Reinforcement Learning with Immediate Rewards and Linear Hypotheses , 2003, Algorithmica.
[33] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[34] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.
[35] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[36] Philip M. Long,et al. Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.