论文信息 - Optimism in the Face of Uncertainty Should be Refutable

Optimism in the Face of Uncertainty Should be Refutable

We give an example from the theory of Markov decision processes which shows that the “optimism in the face of uncertainty” heuristics may fail to make any progress. This is due to the impossibility to falsify a belief that a (transition) probability is larger than 0. Our example shows the utility of Popper’s demand of falsifiability of hypotheses in the area of artificial intelligence.

Ronald Ortner | R. Ortner

[1] K. Popper,et al. Logik der Forschung , 1935 .

[2] K. Cocks. Discrete Stochastic Programming , 1968 .

[3] J. Kemeny,et al. Denumerable Markov chains , 1969 .

[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[6] Michael L. Littman,et al. An empirical evaluation of interval estimation for Markov decision processes , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[7] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.

[8] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.

[9] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[10] U. Rieder,et al. Markov Decision Processes , 2010 .