Simple Strategies in Multi-Objective MDPs (Technical Report)

We consider the verification of multiple expected reward objectives at once on Markov decision processes (MDPs). This enables a trade-off analysis among multiple objectives by obtaining the Pareto front. We focus on strategies that are easy to employ and implement. That is, strategies that are pure (no randomization) and have bounded memory. We show that checking whether a point is achievable by a pure stationary strategy is NP-complete, even for two objectives, and we provide an MILP encoding to solve the corresponding problem. The bounded memory case can be reduced to the stationary one by a product construction. Experimental results using \Storm and Gurobi show the feasibility of our algorithms.

[1]  J. Katoen,et al.  Simple Strategies in Multi-Objective MDPs , 2019, TACAS.

[2]  Patricia Bouyer,et al.  Multi-weighted Markov Decision Processes with Reachability Objectives , 2018, GandALF.

[3]  Sebastian Junges,et al.  Multi-cost Bounded Reachability in MDP , 2018, TACAS.

[4]  Marta Z. Kwiatkowska,et al.  PRISM-games: verification and strategy synthesis for stochastic multi-player games with multiple objectives , 2017, International Journal on Software Tools for Technology Transfer.

[5]  Sebastian Junges,et al.  Permissive Finite-State Controllers of POMDPs using Parameter Synthesis , 2017, ArXiv.

[6]  Peter Buchholz,et al.  Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters , 2017, VALUETOOLS.

[7]  Christel Baier,et al.  Ensuring the Reliability of Your Model Checker: Interval Iteration for Markov Decision Processes , 2017, CAV.

[8]  Benjamin Müller,et al.  The SCIP Optimization Suite 5.0 , 2017, 2112.08872.

[9]  Sebastian Junges,et al.  Markov automata with multiple objectives , 2017, Formal Methods in System Design.

[10]  Mickael Randour,et al.  Threshold Constraints with Guarantees for Parity Objectives in Markov Decision Processes , 2017, ICALP.

[11]  Sebastian Junges,et al.  A Storm is Coming: A Modern Probabilistic Model Checker , 2017, CAV.

[12]  Nick Hawes,et al.  Multi-Objective Policy Generation for Mobile Robots under Probabilistic Time-Bounded Guarantees , 2017, ICAPS.

[13]  D. Giannakopoulou,et al.  Probabilistic verification and synthesis of the next generation airborne collision avoidance system , 2016, International Journal on Software Tools for Technology Transfer.

[14]  Ufuk Topcu,et al.  Controller synthesis for autonomous systems interacting with human operators , 2015, ICCPS.

[15]  Krishnendu Chatterjee,et al.  Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes , 2015, 2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science.

[16]  Mickael Randour,et al.  Variations on the Stochastic Shortest Path Problem , 2014, VMCAI.

[17]  Mickael Randour,et al.  Percentile queries in multi-dimensional Markov decision processes , 2014, CAV.

[18]  Benjamin Monmege,et al.  Reachability in MDPs: Refining Convergence of Value Iteration , 2014, RP.

[19]  Marco Molinaro,et al.  Mixed-integer quadratic programming is in NP , 2014, Mathematical Programming.

[20]  Christel Baier,et al.  Trade-off analysis meets probabilistic model checking , 2014, CSL-LICS.

[21]  Christel Baier,et al.  Energy-Utility Quantiles , 2014, NASA Formal Methods.

[22]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[23]  Véronique Bruyère,et al.  Meet Your Expectations With Guarantees: Beyond Worst-Case Synthesis in Quantitative Games , 2013, STACS.

[24]  Susan A. Murphy,et al.  Linear fitted-Q iteration with multiple reward functions , 2013, J. Mach. Learn. Res..

[25]  Marta Z. Kwiatkowska,et al.  The PRISM Benchmark Suite , 2012, 2012 Ninth International Conference on Quantitative Evaluation of Systems.

[26]  Marta Z. Kwiatkowska,et al.  Pareto Curves for Probabilistic Model Checking , 2012, ATVA.

[27]  Taolue Chen,et al.  Verifying Team Formation Protocols with Probabilistic Model Checking , 2011, CLIMA.

[28]  Marta Z. Kwiatkowska,et al.  PRISM 4.0: Verification of Probabilistic Real-Time Systems , 2011, CAV.

[29]  Marta Z. Kwiatkowska,et al.  Automated Verification Techniques for Probabilistic Systems , 2011, SFM.

[30]  Hongyang Qu,et al.  Quantitative Multi-objective Verification for Probabilistic Systems , 2011, TACAS.

[31]  Patrice Perny,et al.  On Finding Compromise Solutions in Multiobjective Markov Decision Processes , 2010, ECAI.

[32]  Srini Narayanan,et al.  Learning all optimal policies with multiple criteria , 2008, ICML '08.

[33]  Christel Baier,et al.  Principles of model checking , 2008 .

[34]  M.A. Wiering,et al.  Computing Optimal Stationary Policies for Multi-Objective Markov Decision Processes , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[35]  Kousha Etessami,et al.  Multi-Objective Model Checking of Markov Decision Processes , 2007, Log. Methods Comput. Sci..

[36]  Thomas A. Henzinger,et al.  Markov Decision Processes with Multiple Objectives , 2006, STACS.

[37]  T. Henzinger,et al.  Trading memory for randomness , 2004, First International Conference on the Quantitative Evaluation of Systems, 2004. QEST 2004. Proceedings..

[38]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[39]  Qinru Qiu,et al.  Stochastic modeling of a power-managed system: construction and optimization , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[40]  Luca Benini,et al.  Policy optimization for dynamic power management , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[41]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[42]  Mandyam M. Srinivasan,et al.  Nondeterministic polling systems , 1991 .

[43]  Christel Baier,et al.  The 10, 000 Facets of MDP Model Checking , 2019, Computing and Software Science.

[44]  Greg N. Frederickson,et al.  Sequencing Tasks with Exponential Service Times to Minimize the Expected Flow Time or Makespan , 1981, JACM.

[45]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .