Recursive stochastic games with positive rewards

Abstract We study the complexity of a class of Markov decision processes and, more generally, stochastic games, called 1-exit Recursive Markov Decision Processes (1-RMDPs) and 1-exit Recursive Simple Stochastic Games (1-RSSGs), with strictly positive rewards. These are a class of finitely presented countable-state zero-sum turn-based stochastic games that subsume standard finite-state MDPs and Condon's simple stochastic games. They correspond to optimization and game versions of several classic stochastic models, with rewards. In particular, they correspond to the MDP and game versions of multi-type branching processes and stochastic context-free grammars with strictly positive rewards. The goal of the two players in the game is to maximize/minimize the total expected reward generated by a play of the game. Such stochastic models arise naturally as models of probabilistic procedural programs with recursion, and the problems we address are motivated by the goal of analyzing the optimal/pessimal expected running time in such a setting. We first show that in such games both players have optimal deterministic “stackless and memoryless” optimal strategies. We then provide polynomial-time algorithms for computing the exact optimal expected reward (which may be infinite, but is otherwise rational), and optimal strategies, for both the maximizing and minimizing single-player versions of the game, i.e., for (1-exit) Recursive Markov Decision Processes (1-RMDPs). It follows that the quantitative decision problem for positive reward 1-RSSGs is in NP ∩ coNP. We show that Condon's well-known quantitative termination problem for finite-state simple stochastic games (SSGs) which she showed to be in NP ∩ coNP reduces to a special case of the reward problem for 1-RSSGs, namely, deciding whether the value is ∞. By contrast, for finite-state SSGs with strictly positive rewards, deciding if this expected reward value is ∞ is solvable in P-time. We also show that there is a simultaneous strategy improvement algorithm that converges in a finite number of steps to the value and optimal strategies of a 1-RSSG with positive rewards.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Helmut Seidl,et al.  Solving systems of rational equations through strategy iteration , 2011, TOPL.

[3]  Peter Whittle,et al.  Growth Optimality for Branching Markov Decision Chains , 1982, Math. Oper. Res..

[4]  Dominik Wojtczak,et al.  Expected Termination Time in BPA Games , 2013, ATVA.

[5]  Dominik Wojtczak,et al.  Recursive probabilistic models : efficient analysis and implementation , 2009 .

[6]  Javier Esparza,et al.  Quantitative analysis of probabilistic pushdown automata: expectations and variances , 2005, 20th Annual IEEE Symposium on Logic in Computer Science (LICS' 05).

[7]  Ashutosh Trivedi,et al.  Timed Branching Processes , 2010, 2010 Seventh International Conference on the Quantitative Evaluation of Systems.

[8]  Oliver Friedmann,et al.  An Exponential Lower Bound for the Latest Deterministic Strategy Iteration Algorithms , 2011, Log. Methods Comput. Sci..

[9]  A. F. Veinott Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[10]  Kousha Etessami,et al.  On the Complexity of Nash Equilibria and Other Fixed Points , 2010, SIAM J. Comput..

[11]  John Fearnley,et al.  Exponential Lower Bounds for Policy Iteration , 2010, ICALP.

[12]  Kousha Etessami,et al.  Recursive Markov Decision Processes and Recursive Stochastic Games , 2005, ICALP.

[13]  S. Pliska Optimization of Multitype Branching Processes , 1976 .

[14]  R. Karp,et al.  On Nonterminating Stochastic Games , 1966 .

[15]  Xiaotie Deng,et al.  Settling the complexity of computing two-player Nash equilibria , 2007, JACM.

[16]  Vincent D. Blondel,et al.  Undecidable Problems for Probabilistic Automata of Fixed Dimension , 2003, Theory of Computing Systems.

[17]  Kousha Etessami,et al.  Recursive Stochastic Games with Positive Rewards , 2008, ICALP.

[18]  Petr Novotný,et al.  Minimizing Expected Termination Time in One-Counter Markov Decision Processes , 2012, ICALP.

[19]  Tomás Brázdil,et al.  Qualitative reachability in stochastic BPA games , 2011, Inf. Comput..

[20]  Kousha Etessami,et al.  Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min(Max) Polynomial Bellman Equations , 2012, ICALP.

[21]  Donald A. Martin,et al.  The determinacy of Blackwell games , 1998, Journal of Symbolic Logic.

[22]  Javier Esparza,et al.  Model checking probabilistic pushdown automata , 2004, Proceedings of the 19th Annual IEEE Symposium on Logic in Computer Science, 2004..

[23]  Mihalis Yannakakis,et al.  How easy is local search? , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[24]  Anne Condon,et al.  The Complexity of Stochastic Games , 1992, Inf. Comput..

[25]  Tomás Brázdil,et al.  Runtime analysis of probabilistic programs with unbounded recursion , 2015, J. Comput. Syst. Sci..

[26]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[27]  Anna R. Karlin,et al.  Random walks with “back buttons” (extended abstract) , 2000, STOC '00.

[28]  Paul W. Goldberg,et al.  The Complexity of Computing a Nash Equilibrium , 2009, SIAM J. Comput..

[29]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[30]  Tomás Brázdil,et al.  Reachability in recursive Markov decision processes , 2008, Inf. Comput..

[31]  T. E. Harris,et al.  The Theory of Branching Processes. , 1963 .

[32]  P. Jagers,et al.  Branching Processes: Variation, Growth, and Extinction of Populations , 2005 .

[33]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[34]  Kousha Etessami,et al.  Greatest Fixed Points of Probabilistic Min/Max Polynomial Equations, and Reachability for Branching Markov Decision Processes , 2015, ICALP.