Near-Bayesian exploration in polynomial time
暂无分享,去创建一个
[1] A. A. Feldbaum,et al. DUAL CONTROL THEORY, IV , 1961 .
[2] E. Slud. Distribution Inequalities for the Binomial Law , 1977 .
[3] P. W. Jones,et al. Multi-armed Bandit Allocation Indices , 1989 .
[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[5] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[6] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[7] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[8] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[9] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[10] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.
[11] Heinz Unbehauen,et al. Adaptive Dual Control: Theory and Applications , 2004 .
[12] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[13] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[14] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[15] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[16] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[17] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.
[18] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[19] Nicholas Roy,et al. CORL: A Continuous-state Offset-dynamics Reinforcement Learner , 2008, UAI.
[20] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.