Multi-cost Bounded Reachability in MDP

We provide an efficient algorithm for multi-objective model-checking problems on Markov decision processes (MDPs) with multiple cost structures. The key problem at hand is to check whether there exists a scheduler for a given MDP such that all objectives over cost vectors are fulfilled. Reachability and expected cost objectives are covered and can be mixed. Empirical evaluation shows the algorithm’s scalability. We discuss the need for output beyond Pareto curves and exploit the available information from the algorithm to support decision makers.

[1]  Tim Kelly,et al.  Safe Multi-objective Planning with a Posteriori Preferences , 2016, 2016 IEEE 17th International Symposium on High Assurance Systems Engineering (HASE).

[2]  Christel Baier,et al.  Energy-Utility Quantiles , 2014, NASA Formal Methods.

[3]  Marta Z. Kwiatkowska,et al.  PRISM 4.0: Verification of Probabilistic Real-Time Systems , 2011, CAV.

[4]  Ananya Christman,et al.  Maximizing the Probability of Arriving on Time , 2013, ASMTA.

[5]  Krishnendu Chatterjee,et al.  Optimal cost almost-sure reachability in POMDPs , 2014, Artif. Intell..

[6]  Christel Baier,et al.  Maximizing the Conditional Expected Reward for Reaching the Goal , 2017, TACAS.

[7]  Krishnendu Chatterjee,et al.  Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes , 2011, 2011 IEEE 26th Annual Symposium on Logic in Computer Science.

[8]  Ari K. Jónsson,et al.  Activity Planning for the Mars Exploration Rovers , 2005, ICAPS.

[9]  Jeremy Sproston,et al.  Model Checking Durational Probabilistic Systems , 2005, FoSSaCS.

[10]  Marta Z. Kwiatkowska,et al.  Pareto Curves for Probabilistic Model Checking , 2012, ATVA.

[11]  Krishnendu Chatterjee,et al.  Trading Performance for Stability in Markov Decision Processes , 2013, 2013 28th Annual ACM/IEEE Symposium on Logic in Computer Science.

[12]  Joost-Pieter Katoen,et al.  Discrete-Time Rewards Model-Checked , 2003, FORMATS.

[13]  A. Westerberg,et al.  Multiobjective decision processes under uncertainty : Applications, problem formulations, and solution strategies , 2005 .

[14]  Olivier Buffet,et al.  Goal Probability Analysis in Probabilistic Planning: Exploring and Enhancing the State of the Art , 2016, J. Artif. Intell. Res..

[15]  Benjamin Monmege,et al.  Reachability in MDPs: Refining Convergence of Value Iteration , 2014, RP.

[16]  Jie Zhang,et al.  Maximizing the Probability of Arriving on Time: A Practical Q-Learning Method , 2017, AAAI.

[17]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[18]  Ufuk Topcu,et al.  Controller synthesis for autonomous systems interacting with human operators , 2015, ICCPS.

[19]  Sebastian Junges,et al.  Markov automata with multiple objectives , 2017, Formal Methods in System Design.

[20]  Kousha Etessami,et al.  Multi-objective Model Checking of Markov Decision Processes , 2007, TACAS.

[21]  Christel Baier,et al.  Ensuring the Reliability of Your Model Checker: Interval Iteration for Markov Decision Processes , 2017, CAV.

[22]  Florent Teichteil-Königsbuch Stochastic Safest and Shortest Path Problems , 2012, AAAI.

[23]  Holger Hermanns,et al.  Multi-objective Robust Strategy Synthesis for Interval Markov Decision Processes , 2017, QEST.

[24]  Sebastian Junges,et al.  Safety-Constrained Reinforcement Learning for MDPs , 2015, TACAS.

[25]  S. Kambhampati,et al.  Probabilistic Planning is Multi-objective! , 2007 .

[26]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[27]  Jason A. D. Atkin,et al.  A technique based on trade-off maps to visualise and analyse relationships between objectives in optimisation problems , 2017 .

[28]  Krishnendu Chatterjee,et al.  Verification of Markov Decision Processes Using Learning Algorithms , 2014, ATVA.

[29]  Holger Hermanns,et al.  The Modest Toolset: An Integrated Environment for Quantitative Modelling and Verification , 2014, TACAS.

[30]  Marta Z. Kwiatkowska,et al.  Automated Verification Techniques for Probabilistic Systems , 2011, SFM.

[31]  Christel Baier,et al.  Advances in probabilistic model checking with PRISM: variable reordering, quantiles and weak deterministic Büchi automata , 2017, International Journal on Software Tools for Technology Transfer.

[32]  Marta Z. Kwiatkowska,et al.  The PRISM Benchmark Suite , 2012, 2012 Ninth International Conference on Quantitative Evaluation of Systems.

[33]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[34]  Taolue Chen,et al.  On Stochastic Games with Multiple Objectives , 2013, MFCS.

[35]  Ping Hou,et al.  Revisiting Risk-Sensitive MDPs: New Algorithms and Results , 2014, ICAPS.

[36]  Mickael Randour,et al.  Percentile queries in multi-dimensional Markov decision processes , 2014, CAV.

[37]  Arnd Hartmanns,et al.  A Comparison of Time- and Reward-Bounded Probabilistic Model Checking Techniques , 2016, SETTA.

[38]  Mickael Randour,et al.  Threshold Constraints with Guarantees for Parity Objectives in Markov Decision Processes , 2017, ICALP.

[39]  Thomas A. Henzinger,et al.  Markov Decision Processes with Multiple Objectives , 2006, STACS.

[40]  Mausam,et al.  A Theory of Goal-Oriented MDPs with Dead Ends , 2012, UAI.

[41]  Stella X. Yu,et al.  Optimization Models for the First Arrival Target Distribution Function in Discrete Time , 1998 .

[42]  Srini Narayanan,et al.  Learning all optimal policies with multiple criteria , 2008, ICML '08.

[43]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[44]  Sebastian Junges,et al.  A Storm is Coming: A Modern Probabilistic Model Checker , 2017, CAV.

[45]  Nick Hawes,et al.  Multi-Objective Policy Generation for Mobile Robots under Probabilistic Time-Bounded Guarantees , 2017, ICAPS.