The Complexity of Graph-Based Reductions for Reachability in Markov Decision Processes

We study the never-worse relation (NWR) for Markov decision processes with an infinite-horizon reachability objective. A state q is never worse than a state p if the maximal probability of reaching the target set of states from p is at most the same value from q, regardless of the probabilities labelling the transitions. Extremal-probability states, end components, and essential states are all special cases of the equivalence relation induced by the NWR. Using the NWR, states in the same equivalence class can be collapsed. Then, actions leading to sub-optimal states can be removed. We show the natural decision problem associated to computing the NWR is coNP-complete. Finally, we describe an incomplete polynomial-time iterative algorithm to under-approximate the NWR.

[1]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[2]  Ufuk Topcu,et al.  Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints , 2014, Robotics: Science and Systems.

[3]  Mihalis Yannakakis,et al.  The complexity of probabilistic verification , 1995, JACM.

[4]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5]  Christel Baier,et al.  Principles of model checking , 2008 .

[6]  Stuart J. Russell,et al.  Research Priorities for Robust and Beneficial Artificial Intelligence , 2015, AI Mag..

[7]  John E. Hopcroft,et al.  The Directed Subgraph Homeomorphism Problem , 1978, Theor. Comput. Sci..

[8]  Hector Geffner,et al.  Heuristic Search for Generalized Stochastic Shortest Path MDPs , 2011, ICAPS.

[9]  Marta Z. Kwiatkowska,et al.  PRISM 4.0: Verification of Probabilistic Real-Time Systems , 2011, CAV.

[10]  Gérard P. Huet,et al.  Confluent Reductions: Abstract Properties and Applications to Term Rewriting Systems , 1980, J. ACM.

[11]  Olivier Buffet,et al.  Goal Probability Analysis in Probabilistic Planning: Exploring and Enhancing the State of the Art , 2016, J. Artif. Intell. Res..

[12]  Christel Baier,et al.  Reduction Techniques for Model Checking Markov Decision Processes , 2008, 2008 Fifth International Conference on Quantitative Evaluation of Systems.

[13]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[14]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[15]  Sebastian Junges,et al.  A Storm is Coming: A Modern Probabilistic Model Checker , 2017, CAV.

[16]  Kenji Kawaguchi,et al.  Bounded Optimal Exploration in MDP , 2016, AAAI.

[17]  Zohar Manna,et al.  Formal verification of probabilistic systems , 1997 .

[18]  Krishnendu Chatterjee,et al.  Verification of Markov Decision Processes Using Learning Algorithms , 2014, ATVA.

[19]  Tali Eilam-Tzoreff,et al.  The Disjoint Shortest Paths Problem , 1998, Discret. Appl. Math..

[20]  Lihong Li,et al.  Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..

[21]  Ufuk Topcu,et al.  Reduction Techniques for Model Checking and Learning in MDPs , 2017, IJCAI.

[22]  Krishnendu Chatterjee,et al.  Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification , 2011, SODA '11.

[23]  Chris Arney Probably Approximately Correct: Nature's Algorithms for Learning and Prospering in a Complex World , 2014 .

[24]  Florian Horn,et al.  Two Recursively Inseparable Problems for Probabilistic Automata , 2014, MFCS.

[25]  Henrik Ejersbo Jensen,et al.  Reachability Analysis of Probabilistic Systems by Successive Refinements , 2001, PAPM-PROBMIV.