论文信息 - Discounted deterministic Markov decision processes and discounted all-pairs shortest paths - 字舞流文

Discounted deterministic Markov decision processes and discounted all-pairs shortest paths

We present algorithms for finding optimal strategies for discounted, infinite-horizon, Determinsitc Markov Decision Processes (DMDPs). Our fastest algorithm has a worst-case running time of <i>O</i>(<i>mn</i>), improving the recent bound of <i>O</i>(<i>mn</i><sup>2</sup>) obtained by Andersson and Vorbyov [2006]. We also present a randomized <i>O</i>(<i>m</i><sup>1/2</sup><i>n</i><sup>2</sup>)-time algorithm for finding Discounted All-Pairs Shortest Paths (DAPSP), improving an <i>O</i>(<i>mn</i><sup>2</sup>)-time algorithm that can be obtained using ideas of Papadimitriou and Tsitsiklis [1987].

Mikkel Thorup | Uri Zwick | Omid Madani | M. Thorup | Omid Madani | Uri Zwick

[1] Alfred V. Aho,et al. The Design and Analysis of Computer Algorithms , 1974 .

[2] Omid Madani. On policy iteration as a Newton's method and polynomial policy iteration algorithms , 2002, AAAI/IAAI.

[3] Edith Cohen,et al. Improved algorithms for linear inequalities with two variables per inequality , 1991, STOC '91.

[4] Yinyu Ye,et al. A New Complexity Result on Solving the Markov Decision Problem , 2005, Math. Oper. Res..

[5] S. Vorobyov,et al. Fast Algorithms for Monotonic Discounted Linear Programs with Two Variables per Inequality , 2006 .

[6] Richard M. Anderson,et al. Complexity results for infinite-horizon markov decision processes , 2000 .

[7] Robert E. Tarjan,et al. Fibonacci heaps and their uses in improved network optimization algorithms , 1984, JACM.

[8] Narendra Karmarkar,et al. A new polynomial-time algorithm for linear programming , 1984, STOC '84.

[9] Uri Zwick,et al. The Complexity of Mean Payoff Games on Graphs , 1996, Theor. Comput. Sci..

[10] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[11] Uri Zwick,et al. All pairs shortest paths using bridging sets and rectangular matrix multiplication , 2000, JACM.

[12] Joseph Naor,et al. Simple and Fast Algorithms for Linear and Integer Programs With Two Variables per Inequality , 1994, SIAM J. Comput..

[13] Anne Condon,et al. On the Complexity of the Policy Improvement Algorithm for Markov Decision Processes , 1994, INFORMS J. Comput..

[14] Ali Dasdan,et al. An Experimental Study of Minimum Mean Cycle Algorithms , 1998 .

[15] Nir Halman,et al. Simple Stochastic Games, Parity Games, Mean Payoff Games and Discounted Payoff Games Are All LP-Type Problems , 2007, Algorithmica.

[16] J. W. Nieuwenhuis,et al. Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[17] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .

[18] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[19] Henrik Björklund,et al. Combinatorial structure and randomized subexponential algorithms for infinite games , 2005, Theor. Comput. Sci..

[20] A. Ehrenfeucht,et al. Positional strategies for mean payoff games , 1979 .

[21] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[22] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .

[23] Omid Madani,et al. Polynomial Value Iteration Algorithms for Detrerminstic MDPs , 2002, UAI.

[24] Robert E. Tarjan,et al. Faster parametric shortest path and minimum-balance algorithms , 1991, Networks.

[25] Yishay Mansour,et al. On the Complexity of Policy Iteration , 1999, UAI.

[26] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[27] Andrew V. Goldberg,et al. An Experimental Study of Minimum Mean Cycle Algorithms , 2009, ALENEX.

[28] Lenore Blum,et al. Complexity and Real Computation , 1997, Springer New York.

[29] Anne Condon,et al. The Complexity of Stochastic Games , 1992, Inf. Comput..

[30] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[31] Mihalis Yannakakis,et al. High-Probability Parallel Transitive-Closure Algorithms , 1991, SIAM J. Comput..

[32] Thomas H. Cormen,et al. Introduction to algorithms [2nd ed.] , 2001 .

[33] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[34] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .

[35] Richard M. Karp,et al. A characterization of the minimum cycle mean in a digraph , 1978, Discret. Math..

[36] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .

[37] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[38] A. Karzanov,et al. Cyclic games and an algorithm to find minimax cycle means in directed graphs , 1990 .

[39] Ali Dasdan,et al. Experimental analysis of the fastest optimum cycle ratio and mean algorithms , 2004, TODE.

[40] U. Rieder,et al. Markov Decision Processes , 2010 .

[41] Walter Ludwig,et al. A Subexponential Randomized Algorithm for the Simple Stochastic Game Problem , 1995, Inf. Comput..

[42] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[43] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .