论文信息 - On the Speed of Convergence of Value Iteration on Stochastic Shortest-Path Problems

On the Speed of Convergence of Value Iteration on Stochastic Shortest-Path Problems

We establish a bound on the convergence time of the value iteration algorithm on stochastic shortest-path problems. The bound, which applies for admissible initial vectors as, for example, J\equiv 0 , implies a polynomial-time convergence of value iteration for all problems with polynomially bounded \Vert{J^*}\Vert/\underline{g} . This result gives a partial answer to the open problem of bounding the convergence time of value iteration on arbitrary initial vectors. The proof is obtained by analyzing a stochastic process associated with the shortest-path problem.

Blai Bonet | Blai Bonet

[1] Richard E. Korf,et al. Depth-First Iterative-Deepening: An Optimal Admissible Tree Search , 1985, Artif. Intell..

[2] Rina Dechter,et al. Generalized best-first search strategies and the optimality of A* , 1985, JACM.

[3] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[4] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[5] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[6] R. Bellman. Dynamic programming. , 1957, Science.

[7] P. Tseng. Solving H-horizon, stationary Markov decision problems in time proportional to log(H) , 1990 .

[8] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[9] L. A. Zadeh,et al. Optimal Pursuit Strategies in Discrete-State Probabilistic Systems , 1962 .

[10] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[11] Sven Koenig,et al. Minimax real-time heuristic search , 2001, Artif. Intell..

[12] Nils J. Nilsson,et al. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..