Discounted deterministic Markov decision processes and discounted all-pairs shortest paths

We present algorithms for finding optimal strategies for discounted, infinite-horizon, Determinsitc Markov Decision Processes (DMDPs). Our fastest algorithm has a worst-case running time of <i>O</i>(<i>mn</i>), improving the recent bound of <i>O</i>(<i>mn</i><sup>2</sup>) obtained by Andersson and Vorbyov [2006]. We also present a randomized <i>O</i>(<i>m</i><sup>1/2</sup><i>n</i><sup>2</sup>)-time algorithm for finding Discounted All-Pairs Shortest Paths (DAPSP), improving an <i>O</i>(<i>mn</i><sup>2</sup>)-time algorithm that can be obtained using ideas of Papadimitriou and Tsitsiklis [1987].

[1]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[2]  Omid Madani On policy iteration as a Newton's method and polynomial policy iteration algorithms , 2002, AAAI/IAAI.

[3]  Edith Cohen,et al.  Improved algorithms for linear inequalities with two variables per inequality , 1991, STOC '91.

[4]  Yinyu Ye,et al.  A New Complexity Result on Solving the Markov Decision Problem , 2005, Math. Oper. Res..

[5]  S. Vorobyov,et al.  Fast Algorithms for Monotonic Discounted Linear Programs with Two Variables per Inequality , 2006 .

[6]  Richard M. Anderson,et al.  Complexity results for infinite-horizon markov decision processes , 2000 .

[7]  Robert E. Tarjan,et al.  Fibonacci heaps and their uses in improved network optimization algorithms , 1984, JACM.

[8]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, STOC '84.

[9]  Uri Zwick,et al.  The Complexity of Mean Payoff Games on Graphs , 1996, Theor. Comput. Sci..

[10]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[11]  Uri Zwick,et al.  All pairs shortest paths using bridging sets and rectangular matrix multiplication , 2000, JACM.

[12]  Joseph Naor,et al.  Simple and Fast Algorithms for Linear and Integer Programs With Two Variables per Inequality , 1994, SIAM J. Comput..

[13]  Anne Condon,et al.  On the Complexity of the Policy Improvement Algorithm for Markov Decision Processes , 1994, INFORMS J. Comput..

[14]  Ali Dasdan,et al.  An Experimental Study of Minimum Mean Cycle Algorithms , 1998 .

[15]  Nir Halman,et al.  Simple Stochastic Games, Parity Games, Mean Payoff Games and Discounted Payoff Games Are All LP-Type Problems , 2007, Algorithmica.

[16]  J. W. Nieuwenhuis,et al.  Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[17]  F. d'Epenoux,et al.  A Probabilistic Production and Inventory Problem , 1963 .

[18]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[19]  Henrik Björklund,et al.  Combinatorial structure and randomized subexponential algorithms for infinite games , 2005, Theor. Comput. Sci..

[20]  A. Ehrenfeucht,et al.  Positional strategies for mean payoff games , 1979 .

[21]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[22]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[23]  Omid Madani,et al.  Polynomial Value Iteration Algorithms for Detrerminstic MDPs , 2002, UAI.

[24]  Robert E. Tarjan,et al.  Faster parametric shortest path and minimum-balance algorithms , 1991, Networks.

[25]  Yishay Mansour,et al.  On the Complexity of Policy Iteration , 1999, UAI.

[26]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[27]  Andrew V. Goldberg,et al.  An Experimental Study of Minimum Mean Cycle Algorithms , 2009, ALENEX.

[28]  Lenore Blum,et al.  Complexity and Real Computation , 1997, Springer New York.

[29]  Anne Condon,et al.  The Complexity of Stochastic Games , 1992, Inf. Comput..

[30]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[31]  Mihalis Yannakakis,et al.  High-Probability Parallel Transitive-Closure Algorithms , 1991, SIAM J. Comput..

[32]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[33]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[34]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[35]  Richard M. Karp,et al.  A characterization of the minimum cycle mean in a digraph , 1978, Discret. Math..

[36]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[37]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[38]  A. Karzanov,et al.  Cyclic games and an algorithm to find minimax cycle means in directed graphs , 1990 .

[39]  Ali Dasdan,et al.  Experimental analysis of the fastest optimum cycle ratio and mean algorithms , 2004, TODE.

[40]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[41]  Walter Ludwig,et al.  A Subexponential Randomized Algorithm for the Simple Stochastic Game Problem , 1995, Inf. Comput..

[42]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[43]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .