Trading Performance for Stability in Markov Decision Processes

We study controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize the expected mean-payoff performance and stability (also known as variability in the literature). We argue that the basic notion of expressing the stability using the statistical variance of the mean payoff is sometimes insufficient, and propose an alternative definition. We show that a strategy ensuring both the expected mean payoff and the variance below given bounds requires randomization and memory, under both the above definitions. We then show that the problem of finding such a strategy can be expressed as a set of constraints. We show that optimizing mean payoff and stability requires memory and randomization.We show how to reduce existence of optimal strategies to a constrained problem.We give a complexity classification for the problem of finding an optimal strategy.

[1]  Marta Z. Kwiatkowska,et al.  Automated Verification Techniques for Probabilistic Systems , 2011, SFM.

[2]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3]  Krishnendu Chatterjee,et al.  Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification , 2011, SODA '11.

[4]  Thomas A. Henzinger,et al.  Markov Decision Processes with Multiple Objectives , 2006, STACS.

[5]  Jerzy A. Filar,et al.  Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..

[6]  Krishnendu Chatterjee,et al.  Quantitative stochastic parity games , 2004, SODA '04.

[7]  Katta G. Murty,et al.  Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..

[8]  Panos M. Pardalos,et al.  Quadratic programming with one negative eigenvalue is NP-hard , 1991, J. Glob. Optim..

[9]  Marta Z. Kwiatkowska,et al.  Pareto Curves for Probabilistic Model Checking , 2012, ATVA.

[10]  Matthew J. Sobel,et al.  Mean-Variance Tradeoffs in an Undiscounted MDP , 1994, Oper. Res..

[11]  Kellen Petersen August Real Analysis , 2009 .

[12]  Vojtech Forejt,et al.  Trading Performance for Stability in Markov Decision Processes , 2013, 2013 28th Annual ACM/IEEE Symposium on Logic in Computer Science.

[13]  Kun-Jen Chung Mean-Variance Tradeoffs in an Undiscounted MDP: The Unichain Case , 1994, Oper. Res..

[14]  Neil Immerman,et al.  First-Order and Temporal Logics for Nested Words , 2007, 22nd Annual IEEE Symposium on Logic in Computer Science (LICS 2007).

[15]  Krishnendu Chatterjee,et al.  Efficient and Dynamic Algorithms for Alternating Büchi Games and Maximal End-Component Decomposition , 2014, J. ACM.

[16]  Russell Lyons,et al.  Strong laws of large numbers for weakly correlated random variables. , 1988 .

[17]  D. Vere-Jones Markov Chains , 1972, Nature.

[18]  Hongyang Qu,et al.  Quantitative Multi-objective Verification for Probabilistic Systems , 2011, TACAS.

[19]  John F. Canny,et al.  Some algebraic and geometric computations in PSPACE , 1988, STOC '88.

[20]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[21]  Krishnendu Chatterjee,et al.  Symbolic algorithms for qualitative analysis of Markov decision processes with Büchi objectives , 2011, Formal Methods Syst. Des..

[22]  Stephen A. Vavasis,et al.  Quadratic Programming is in NP , 1990, Inf. Process. Lett..

[23]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[24]  Stephen A. Vavasis,et al.  Approximation algorithms for indefinite quadratic programming , 1992, Math. Program..

[25]  Krishnendu Chatterjee,et al.  Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes , 2011, 2011 IEEE 26th Annual Symposium on Logic in Computer Science.

[26]  John N. Tsitsiklis,et al.  Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.

[27]  Mihalis Yannakakis,et al.  Markov Decision Processes and Regular Events (Extended Abstract) , 1990, ICALP.

[28]  E. Altman Constrained Markov Decision Processes , 1999 .

[29]  Kousha Etessami,et al.  Multi-objective Model Checking of Markov Decision Processes , 2007, TACAS.

[30]  M. J. Sobel The variance of discounted Markov decision processes , 1982 .

[31]  Krishnendu Chatterjee,et al.  An O(n2) time algorithm for alternating Büchi games , 2011, SODA.