One-counter Markov decision processes

We study the computational complexity of some central analysis problems for One-Counter Markov Decision Processes (OC-MDPs), a class of finitely-presented, countable-state MDPs. OC-MDPs extend finite-state MDPs with an unbounded counter. The counter can be incremented, decremented, or not changed during each state transition, and transitions may be enabled or not depending on both the current state and on whether the counter value is 0 or not. Some states are "random", from where the next transition is chosen according to a given probability distribution, while other states are "controlled", from where the next transition is chosen by the controller. Different objectives for the controller give rise to different computational problems, aimed at computing optimal achievable objective values and optimal strategies. OC-MDPs are in fact equivalent to a controlled extension of (discrete-time) Quasi-Birth-Death processes (QBDs), a purely stochastic model heavily studied in queueing theory and applied probability. They can thus be viewed as a natural "adversarial" extension of a classic stochastic model. They can also be viewed as a natural probabilistic/controlled extension of classic one-counter automata. OC-MDPs also subsume (as a very restricted special case) a recently studied MDP model called "solvency games" that model a risk-averse gambling scenario. Basic computational questions for OC-MDPs include "termination" questions and "limit" questions, such as the following: does the controller have a strategy to ensure that the counter (which may, for example, count the number of jobs in the queue) will hit value 0 (the empty queue) almost surely (a.s.)? Or that the counter will have lim sup value ∞, a.s.? Or, that it will hit value 0 in a selected terminal state, a.s.? Or, in case such properties are not satisfied almost surely, compute their optimal probability over all strategies. We provide new upper and lower bounds on the complexity of such problems. Specifically, we show that several quantitative and almost-sure limit problems can be answered in polynomial time, and that almost-sure termination problems (without selection of desired terminal states) can also be answered in polynomial time. On the other hand, we show that the almost-sure termination problem with selected terminal states is PSPACE-hard and we provide an exponential time algorithm for this problem. We also characterize classes of strategies that suffice for optimality in several of these settings. Our upper bounds combine a number of techniques from the theory of MDP reward models, the theory of random walks, and a variety of automata-theoretic methods.

[1]  Victor Shoup,et al.  A computational introduction to number theory and algebra , 2005 .

[2]  Kousha Etessami,et al.  Quasi-Birth-Death Processes, Tree-Like QBDs, Probabilistic 1-Counter Automata, and Pushdown Systems , 2008, 2008 Fifth International Conference on Quantitative Evaluation of Systems.

[3]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[4]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[5]  Vaidyanathan Ramaswami,et al.  Introduction to Matrix Analytic Methods in Stochastic Modeling , 1999, ASA-SIAM Series on Statistics and Applied Mathematics.

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  Langford B. White,et al.  A New Policy Evaluation Algorithm for Markov Decision Processes with Quasi Birth-Death Structure , 2005 .

[8]  Tomás Brázdil,et al.  Reachability in recursive Markov decision processes , 2008, Inf. Comput..

[9]  Faron Moller,et al.  DP lower bounds for equivalence-checking and model-checking of one-counter automata , 2004, Inf. Comput..

[10]  Antonín Kucera,et al.  The complexity of bisimilarity-checking for one-counter processes , 2003, Theor. Comput. Sci..

[11]  Marcel F. Neuts,et al.  Matrix-geometric solutions in stochastic models - an algorithmic approach , 1982 .

[12]  Vijay V. Vazirani,et al.  Solvency Games , 2008, Electron. Colloquium Comput. Complex..

[13]  Allan Borodin,et al.  Adversarial queuing theory , 2001, JACM.

[14]  D. Vere-Jones Markov Chains , 1972, Nature.

[15]  Kousha Etessami,et al.  Efficient Qualitative Analysis of Classes of Recursive Markov Decision Processes and Simple Stochastic Games , 2006, STACS.

[16]  Carl M. Harris,et al.  Fundamentals of queueing theory , 1975 .

[17]  Beatrice Meini,et al.  Numerical methods for structured Markov chains , 2005 .

[18]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[19]  Kousha Etessami,et al.  Quasi-Birth-Death Processes, Tree-Like QBDs, Probabilistic 1-Counter Automata, and Pushdown Systems , 2008, QEST.

[20]  Harry Kesten,et al.  Making money in fair games , 1991 .

[21]  R. LePage Review: Kai Lai Chung, A Course in Probability Theory , 1970 .

[22]  Leslie G. Valiant,et al.  Deterministic One-Counter Automata , 1975, J. Comput. Syst. Sci..

[23]  Olivier Serre,et al.  Parity Games Played on Transition Graphs of One-Counter Processes , 2006, FoSSaCS.

[24]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[25]  Petr Jancar,et al.  A note on emptiness for alternating finite automata with a one-letter alphabet , 2007, Inf. Process. Lett..

[26]  Kai Lai Chung,et al.  A Course in Probability Theory , 1949 .

[27]  Tom Burr,et al.  Introduction to Matrix Analytic Methods in Stochastic Modeling , 2001, Technometrics.

[28]  Hugo Gimbert,et al.  Pure Stationary Optimal Strategies in Markov Decision Processes , 2007, STACS.