论文信息 - Exponential Lower Bounds for Policy Iteration

Exponential Lower Bounds for Policy Iteration

We study policy iteration for infinite-horizon Markov decision processes. It has recently been shown policy iteration style algorithms have exponential lower bounds in a two player game setting. We extend these lower bounds to Markov decision processes with the total reward and average-reward optimality criteria.

John Fearnley | John Fearnley

[1] R. Ash,et al. Probability and measure theory , 1999 .

[2] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .

[3] Anne Condon,et al. On the Complexity of the Policy Improvement Algorithm for Markov Decision Processes , 1994, INFORMS J. Comput..

[4] Oliver Friedmann,et al. A Super-Polynomial Lower Bound for the Parity Game Strategy Improvement Algorithm as We Know it , 2009, ArXiv.

[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6] Yishay Mansour,et al. On the Complexity of Policy Iteration , 1999, UAI.

[7] Oliver Friedmann,et al. An Exponential Lower Bound for the Parity Game Strategy Improvement Algorithm as We Know it , 2009, 2009 24th Annual IEEE Symposium on Logic In Computer Science.

[8] Marcin Jurdzinski,et al. A Discrete Strategy Improvement Algorithm for Solving Parity Games , 2000, CAV.