Life is Random, Time is Not: Markov Decision Processes with Window Objectives

The window mechanism was introduced by Chatterjee et al. to strengthen classical game objectives with time bounds. It permits to synthesize system controllers that exhibit acceptable behaviors within a configurable time frame, all along their infinite execution, in contrast to the traditional objectives that only require correctness of behaviors in the limit. The window concept has proved its interest in a variety of two-player zero-sum games because it enables reasoning about such time bounds in system specifications, but also thanks to the increased tractability that it usually yields. In this work, we extend the window framework to stochastic environments by considering Markov decision processes. A fundamental problem in this context is the threshold probability problem: given an objective it aims to synthesize strategies that guarantee satisfying runs with a given probability. We solve it for the usual variants of window objectives, where either the time frame is set as a parameter, or we ask if such a time frame exists. We develop a generic approach for window-based objectives and instantiate it for the classical mean-payoff and parity objectives, already considered in games. Our work paves the way to a wide use of the window mechanism in stochastic models.

[1]  Mickael Randour,et al.  Threshold Constraints with Guarantees for Parity Objectives in Markov Decision Processes , 2017, ICALP.

[2]  Krishnendu Chatterjee,et al.  Looking at mean-payoff and total-payoff through windows , 2015, Inf. Comput..

[3]  Thomas Wilke,et al.  Automata logics, and infinite games: a guide to current research , 2002 .

[4]  Petr Novotný,et al.  Stability in Graphs and Games , 2016, CONCUR.

[5]  Parosh Aziz Abdulla,et al.  Decisive Markov Chains , 2007, Log. Methods Comput. Sci..

[6]  Stephen D. Travers The complexity of membership problems for circuits over sets of integers , 2004, Theor. Comput. Sci..

[7]  Guillermo A. Pérez,et al.  Looking at mean payoff through foggy windows , 2017, Acta Informatica.

[8]  Marcin Jurdziński,et al.  Deciding the Winner in Parity Games is in UP \cap co-Up , 1998, Inf. Process. Lett..

[9]  Patricia Bouyer,et al.  Multi-weighted Markov Decision Processes with Reachability Objectives , 2018, GandALF.

[10]  Helmut Seidl,et al.  Games through Nested Fixpoints , 2009, CAV.

[11]  Cristian S. Calude,et al.  Deciding parity games in quasipolynomial time , 2017, STOC.

[12]  Moshe Y. Vardi Automatic verification of probabilistic concurrent finite state programs , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[13]  Sebastian Junges,et al.  A Storm is Coming: A Modern Probabilistic Model Checker , 2017, CAV.

[14]  Mickael Randour,et al.  Variations on the Stochastic Shortest Path Problem , 2014, VMCAI.

[15]  Véronique Bruyère,et al.  On the Complexity of Heterogeneous Multidimensional Games , 2016, CONCUR.

[16]  Charles M. Grinstead,et al.  Introduction to probability , 1999, Statistics for the Behavioural Sciences.

[17]  Catriel Beeri,et al.  On the menbership problem for functional and multivalued dependencies in relational databases , 1980, TODS.

[18]  Krishnendu Chatterjee,et al.  Trading Performance for Stability in Markov Decision Processes , 2013, 2013 28th Annual ACM/IEEE Symposium on Logic in Computer Science.

[19]  Jean-François Raskin,et al.  Expected Window Mean-Payoff , 2019, FSTTCS.

[20]  Mickael Randour,et al.  Automated synthesis of reliable and efficient systems through game theory: a case study , 2012, ArXiv.

[21]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[22]  Pierre Ohlmann,et al.  The complexity of mean payoff games using universal graphs , 2018, ArXiv.

[23]  Christel Baier,et al.  Weight monitoring with linear temporal logic: complexity and decidability , 2014, CSL-LICS.

[24]  Christoph Haase,et al.  The Odds of Staying on Budget , 2014, ICALP.

[25]  Neil Immerman,et al.  Number of Quantifiers is Better Than Number of Tape Cells , 1981, J. Comput. Syst. Sci..

[26]  Seinosuke Toda,et al.  PP is as Hard as the Polynomial-Time Hierarchy , 1991, SIAM J. Comput..

[27]  Marcin Jurdzinski,et al.  A pseudo-quasi-polynomial algorithm for mean-payoff parity games , 2018, LICS.

[28]  Véronique Bruyère,et al.  Window Parity Games: An Alternative Approach Toward Parity Games with Time Bounds (Full Version) , 2016, GandALF.

[29]  Christel Baier,et al.  Principles of model checking , 2008 .

[30]  Arno Pauly,et al.  Extending finite-memory determinacy by Boolean combination of winning conditions , 2018, FSTTCS.

[31]  Gilles Brassard,et al.  A note on the complexity of cryptography (Corresp.) , 1979, IEEE Trans. Inf. Theory.

[32]  Christel Baier Reasoning About Cost-Utility Constraints in Probabilistic Models , 2015, RP.

[33]  BeeriCatriel On the menbership problem for functional and multivalued dependencies in relational databases , 1980 .

[34]  John Fearnley,et al.  Reachability in two-clock timed automata is PSPACE-complete , 2013, Inf. Comput..

[35]  Krishnendu Chatterjee,et al.  Finitary Winning in omega-Regular Games , 2006, TACAS.

[36]  Marcin Jurdzinski,et al.  Universal trees grow inside separating automata: Quasi-polynomial lower bounds for parity games , 2018, SODA.

[37]  Mickael Randour,et al.  Percentile queries in multi-dimensional Markov decision processes , 2017, Formal Methods Syst. Des..

[38]  Krishnendu Chatterjee,et al.  Efficient and Dynamic Algorithms for Alternating Büchi Games and Maximal End-Component Decomposition , 2014, J. ACM.

[39]  Krishnendu Chatterjee,et al.  Value Iteration for Long-Run Average Reward in Markov Decision Processes , 2017, CAV.

[40]  Krishnendu Chatterjee,et al.  Quantitative stochastic parity games , 2004, SODA '04.

[41]  John N. Tsitsiklis,et al.  Introduction to Probability , 2002 .

[42]  Sebastian Junges,et al.  Multi-cost Bounded Reachability in MDP , 2018, TACAS.

[43]  Stephen D. Travers The Complexity of Membership Problems for Circuits over Sets of Integers , 2004, MFCS.