On the Sample Complexity of Reinforcement Learning with a Generative Model
暂无分享,去创建一个
[1] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[2] H. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a , 2012 .
[3] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[4] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[8] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[9] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[10] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[11] Csaba Szepesv. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010 .
[12] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[13] Claudio De Persis,et al. Proceedings of the 38th IEEE conference on decision and control , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).
[14] Rafael Castro-Linares,et al. Trajectory tracking for non-holonomic cars: A linear approach to controlled leader-follower formation , 2010, 49th IEEE Conference on Decision and Control (CDC).
[15] Csaba Szepesv. Algorithms for Reinforcement Learning , 2010 .
[16] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[17] R. Munos,et al. Influence and variance of a Markov chain: application to adaptive discretization in optimal control , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).
[18] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[19] Marco Wiering,et al. Reinforcement Learning , 2014, Adaptation, Learning, and Optimization.
[20] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[21] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[22] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[23] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[24] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[25] Peter Vrancx,et al. Reinforcement Learning: State-of-the-Art , 2012 .
[26] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[27] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[28] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[29] Torben Hagerup,et al. A Guided Tour of Chernoff Bounds , 1990, Inf. Process. Lett..
[30] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[31] H. Kappen,et al. Reinforcement Learning with a Near Optimal Rate of Convergence , 2011 .
[32] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .