论文信息 - Transience in Countable MDPs

Transience in Countable MDPs

The Transience objective is not to visit any state infinitely often. While this is not possible in any finite Markov Decision Process (MDP), it can be satisfied in countably infinite ones, e.g., if the transition graph is acyclic. We prove the following fundamental properties of Transience in countably infinite MDPs. 1. There exist uniformly ε-optimal MD strategies (memoryless deterministic) for Transience, even in infinitely branching MDPs. 2. Optimal strategies for Transience need not exist, even if the MDP is finitely branching. However, if an optimal strategy exists then there is also an optimal MD strategy. 3. If an MDP is universally transient (i.e., almost surely transient under all strategies) then many other objectives have a lower strategy complexity than in general MDPs. E.g., ε-optimal strategies for Safety and co-Büchi and optimal strategies for {0, 1, 2}-Parity (where they exist) can be chosen MD, even if the MDP is infinitely branching. 2012 ACM Subject Classification Theory of computation → Random walks and Markov chains; Mathematics of computing → Probability and statistics

Stefan Kiefer | Richard Mayr | Patrick Totzke | Mahsa Shirmohammadi

[1] John N. Tsitsiklis,et al. A survey of computational complexity results in systems and control , 2000, Autom..

[2] Stefan Kiefer,et al. Strategy Complexity of Parity Objectives in Countable MDPs , 2020, CONCUR.

[3] U. Rieder,et al. Markov Decision Processes with Applications to Finance , 2011 .

[4] Andrzej Wlodzimierz Mostowski,et al. Regular expressions for infinite trees and a standard form of automata , 1984, Symposium on Computation Theory.

[5] Richard Mayr,et al. Strategy Complexity of Mean Payoff, Total Payoff and Point Payoff Objectives in Countable MDPs , 2021, CONCUR.

[6] Pieter Abbeel,et al. Learning first-order Markov models for control , 2004, NIPS.

[7] Manfred Schäl,et al. Markov Decision Processes in Finance and Dynamic Options , 2002 .

[8] Stefan Kiefer,et al. Büchi Objectives in Countable MDPs , 2019, ICALP.

[9] D. Ornstein. On the existence of stationary optimal strategies , 1969 .

[10] Groupe Pdmia. Markov Decision Processes In Artificial Intelligence , 2009 .

[11] Stefan Kiefer,et al. How to Play in Infinite MDPs (Invited Talk) , 2020, ICALP.

[12] T. Hill,et al. The Existence of Good Markov Strategies for Decision Processes with General Payoffs , 1987 .

[13] Krishnendu Chatterjee,et al. A survey of stochastic ω-regular games , 2012, J. Comput. Syst. Sci..

[14] Christel Baier,et al. Principles of model checking , 2008 .

[15] Feller William,et al. An Introduction To Probability Theory And Its Applications , 1950 .

[16] William D. Sudderth,et al. Optimal markov strategies , 2020, Decisions in Economics and Finance.

[17] János Flesch,et al. Simplifying optimal strategies in limsup and liminf stochastic games , 2018, Discret. Appl. Math..

[18] Eilon Solan,et al. Reachability and Safety Objectives in Markov Decision Processes on Long but Finite Horizons , 2019, J. Optim. Theory Appl..

[19] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[20] D. Wojtczak,et al. How to Play in Infinite MDPs , 2020 .

[21] Moshe Y. Vardi. Automatic verification of probabilistic concurrent finite state programs , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[22] J. Norris. Appendix: probability and measure , 1997 .

[23] Stephan Merz,et al. Model Checking , 2000 .

[24] Stefan Kiefer,et al. Parity objectives in countable MDPs , 2017, 2017 32nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS).