The Convergence of Value Iteration in Discounted Markov Decision Processes

Abstract Considerable numerical experience indicates that the standard value iteration procedure for infinite horizon discounted Markov decision processes performs much better than the usual error bound analysis suggests. This paper attempts to examine why this happens and introduces an additional pointwise convergence concept to that of the usual maximum norm concept, in order to examine why some states exhibit better convergence behaviour than others. We also present some numerical results.