Reinforcement Learning in Finite MDPs: PAC Analysis
暂无分享,去创建一个
Lihong Li | Michael L. Littman | Alexander L. Strehl | M. Littman | Lihong Li | A. Strehl | A. L. Strehl
[1] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[2] Nick Littlestone,et al. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm , 2004, Machine Learning.
[3] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[5] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[6] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[7] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[8] Reid G. Simmons,et al. The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms , 2005, Machine Learning.
[9] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .
[10] Colin McDiarmid,et al. Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .
[11] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[12] Dana Angluin,et al. Queries and concept learning , 1988, Machine Learning.
[13] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[14] Andrew G. Barto,et al. Local Bandit Approximation for Optimal Learning Problems , 1996, NIPS.
[15] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.
[16] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[17] Michael L. Littman,et al. Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.
[18] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[19] Michael L. Littman,et al. Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.
[20] Michael L. Littman,et al. Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.
[21] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[22] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[23] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[24] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[25] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[26] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.
[27] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[28] Michael L. Littman,et al. Multi-resolution Exploration in Continuous Spaces , 2008, NIPS.
[29] Thomas J. Walsh,et al. Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.
[30] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[31] Thomas J. Walsh,et al. Efficient Exploration With Latent Structure , 2005, Robotics: Science and Systems.
[32] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[33] Lihong Li,et al. The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.
[34] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.
[35] Nicholas Roy,et al. CORL: A Continuous-state Offset-dynamics Reinforcement Learner , 2008, UAI.
[36] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[37] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[38] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[39] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[40] Lihong Li,et al. Online exploration in least-squares policy iteration , 2009, AAMAS.
[41] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[42] Alex M. Andrew,et al. Reinforcement Learning: : An Introduction , 1998 .
[43] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[44] Andrew W. Moore,et al. Rates of Convergence for Variable Resolution Schemes in Optimal Control , 2000, ICML.
[45] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[46] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[47] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[48] Lihong Li,et al. Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.