Reducing reinforcement learning to KWIK online regression
暂无分享,去创建一个
[1] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .
[2] John N. Tsitsiklis,et al. The complexity of dynamic programming , 1989, J. Complex..
[3] D. Sofge. THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .
[4] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .
[7] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[8] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[9] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[10] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[11] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[12] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.
[13] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[14] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.
[15] J. Langford,et al. Reducing T-step reinforcement learning to classifica-tion , 2003 .
[16] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[17] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[18] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[19] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[20] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[21] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[22] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[23] Reid G. Simmons,et al. The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms , 2005, Machine Learning.
[24] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[25] John Langford,et al. Relating reinforcement learning performance to classification performance , 2005, ICML '05.
[26] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[27] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[28] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[29] Peter Stone,et al. Model-Based Exploration in Continuous State Spaces , 2007, SARA.
[30] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.
[31] Vadim Bulitko,et al. Focus of Attention in Reinforcement Learning , 2007, J. Univers. Comput. Sci..
[32] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML '08.
[33] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[34] Lihong Li,et al. Efficient Value-Function Approximation via Online Linear Regression , 2008, ISAIM.
[35] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[36] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[37] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[38] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .
[39] Lihong Li,et al. Online exploration in least-squares policy iteration , 2009, AAMAS.
[40] Matthieu Geist,et al. Kalman Temporal Differences: The deterministic case , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.