Efficient algorithms for Risk-Sensitive Markov Decision Processes with limited budget

Abstract We tackle the problem of finding optimal policies for Markov Decision Processes, that minimize the probability of the cumulative cost exceeding a given budget. Such task falls under the umbrella of Risk-Sensitive Markov Decision Processes, which optimize a non-additive, non-linear function of cumulative cost that incorporates the user's attitude towards risk. Current algorithms for solving that task, for any budget equal or smaller than an user-defined budget, scale poorly when the support of the cost function is large, since they operate in an augmented state space which enumerates all possible remaining budgets. To circumvent this issue, we develop (i) an improved version of the Topological Value Iteration with Dynamic Programming algorithm ( tvi-dp ), and (ii) the first symbolic dynamic programming algorithm for this class of problems, called rs-spudd , that exploits conditional independence in the transition function in the augmented state space. The proposed algorithms improve efficiency by pruning irrelevant states and terminating early, without sacrificing optimality. Empirical results show that rs-spudd is able to solve problems up to 103 times larger than tvi-dp .

[1]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[2]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3]  Jasper Berendsen,et al.  Undecidability of Cost-Bounded Reachability in Priced Probabilistic Timed Automata , 2009, TAMC.

[4]  Sven Koenig,et al.  Risk-Sensitive Planning with One-Switch Utility Functions: Value Iteration , 2005, AAAI.

[5]  R. I. Bahar,et al.  Algebraic decision diagrams and their applications , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[6]  Edmund H. Durfee,et al.  Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors , 2005, IJCAI.

[7]  Khaled Ghedira Constraint satisfaction problems , 2013 .

[8]  Hector Geffner,et al.  Heuristic Search for Generalized Stochastic Shortest Path MDPs , 2011, ICAPS.

[9]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms (Working Paper) , 1971, SWAT.

[10]  Uriel G. Rothblum,et al.  Optimal stopping, exponential utility, and linear programming , 1979, Math. Program..

[11]  R. Howard,et al.  Risk-Sensitive Markov Decision Processes , 1972 .

[12]  Olivier Buffet,et al.  Goal Probability Analysis in Probabilistic Planning: Exploring and Enhancing the State of the Art , 2016, J. Artif. Intell. Res..

[13]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[14]  D. Krass,et al.  Percentile performance criteria for limiting average Markov decision processes , 1995, IEEE Trans. Autom. Control..

[15]  Karina Valdivia Delgado,et al.  Risk-Sensitive Markov Decision Process with Limited Budget , 2017, 2017 Brazilian Conference on Intelligent Systems (BRACIS).

[16]  Peng Dai,et al.  Topological Value Iteration Algorithm for Markov Decision Processes , 2007, IJCAI.

[17]  Stella X. Yu,et al.  Optimization Models for the First Arrival Target Distribution Function in Discrete Time , 1998 .

[18]  Uriel G. Rothblum,et al.  Multiplicative Markov Decision Chains , 1984, Math. Oper. Res..

[19]  Ping Hou,et al.  Revisiting Risk-Sensitive MDPs: New Algorithms and Results , 2014, ICAPS.

[20]  Sven Koenig,et al.  Functional Value Iteration for Decision-Theoretic Planning with General Utility Functions , 2006, AAAI.

[21]  John R. Birge,et al.  Introduction to Stochastic Programming , 1997 .

[22]  M. J. Sobel The variance of discounted Markov decision processes , 1982 .

[23]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[24]  D. White Minimizing a Threshold Probability in Discounted Markov Decision Processes , 1993 .

[25]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[26]  Bart Selman,et al.  Probabilistic planning with non-linear utility functions and worst-case guarantees , 2012, AAMAS.

[27]  Stephen D. Patek,et al.  On terminating Markov decision processes with a risk-averse objective function , 2001, Autom..

[28]  Yoshio Ohtsubo,et al.  Optimal policy for minimizing risk models in Markov decision processes , 2002 .

[29]  S. C. Jaquette A Utility Criterion for Markov Decision Processes , 1976 .

[30]  Joost-Pieter Katoen,et al.  Probably on Time and within BudgetOn Reachability in Priced Probabilistic Timed Automata , 2006, Third International Conference on the Quantitative Evaluation of Systems - (QEST'06).

[31]  Edmund H. Durfee,et al.  Approximate Probabilistic Constraints and Risk-Sensitive Optimization Criteria in Markov Decision Processes , 2004, AI&M.

[32]  Scott Sanner,et al.  Symbolic Dynamic Programming for Discrete and Continuous State MDPs , 2011, UAI.

[33]  Sven Koenig,et al.  An exact algorithm for solving MDPs under risk-sensitive planning objectives with one-switch utility functions , 2008, AAMAS.

[34]  Ping Hou,et al.  Solving Risk-Sensitive POMDPs With and Without Cost Observations , 2016, AAAI.

[35]  Jerzy A. Filar,et al.  Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..