Optimism in the Face of Uncertainty Should be Refutable
暂无分享,去创建一个
[1] K. Popper,et al. Logik der Forschung , 1935 .
[2] K. Cocks. Discrete Stochastic Programming , 1968 .
[3] J. Kemeny,et al. Denumerable Markov chains , 1969 .
[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[5] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[6] Michael L. Littman,et al. An empirical evaluation of interval estimation for Markov decision processes , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.
[7] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[8] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[9] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[10] U. Rieder,et al. Markov Decision Processes , 2010 .