A Stopping Rule for Discounted Markov Decision Processes with Finite Action Sets
暂无分享,去创建一个
In a Discounted Markov Decision Process (DMDP) with finite action sets the Value Iteration Algorithm, under suitable conditions, leads to an optimal policy in a finite number of steps. Determining an upper bound on the necessary number of steps till gaining convergence is an issue of great theoretical and practical interest as it would provide a computationally feasible stopping rule for value iteration as an algorithm for finding an optimal policy. In this paper we find such a bound depending only on structural properties of the Markov Decision Process, under mild standard conditions and an additional "individuality" condition, which is of interest in its own. It should be mentioned that other authors find such kind of constants using non-structural information, i. e., information not immediately apparent from the Decision Process itself. The DMDP is required to fulfill an ergodicity condition and the corresponding ergodicity index plays a critical role in the upper bound.
[1] Nancy L. Stokey,et al. Recursive methods in economic dynamics , 1989 .
[2] Linn I. Sennott,et al. Optimal Stationary Policies in General State Space Markov Decision Chains with Finite Action Sets , 1992, Math. Oper. Res..
[3] J. M. machorro,et al. UNIFORM CONVERGENCE OF VALUE ITERATION POLICIES FOR DISCOUNTED MARKOV DECISION PROCESSES , 2006 .