Revisiting Risk-Sensitive MDPs: New Algorithms and Results

While Markov Decision Processes (MDPs) have been shown to be effective models for planning under uncertainty, the objective to minimize the expected cumulative cost is inappropriate for high-stake planning problems. As such, Yu, Lin, and Yan (1998) introduced the Risk-Sensitive MDP (RS-MDP) model, where the objective is to find a policy that maximizes the probability that the cumulative cost is within some user-defined cost threshold. In this paper, we revisit this problem and introduce new algorithms that are based on classical techniques, such as depth-first search and dynamic programming, and a recently introduced technique called Topological Value Iteration (TVI). We demonstrate the applicability of our approach on randomly generated MDPs as well as domains from the ICAPS 2011 International Probabilistic Planning Competition (IPPC).

[1]  Hoong Chuin Lau,et al.  Dynamic Stochastic Orienteering Problems for Risk-Aware Applications , 2012, UAI.

[2]  Bart Selman,et al.  Probabilistic planning with non-linear utility functions and worst-case guarantees , 2012, AAMAS.

[3]  Hector Geffner,et al.  Heuristic Search for Generalized Stochastic Shortest Path MDPs , 2011, ICAPS.

[4]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[5]  Paul R. Cohen,et al.  Trial by Fire: Understanding the Design Requirements for Agents in Complex Environments , 1989, AI Mag..

[6]  Manuela M. Veloso,et al.  Thresholded Rewards: Acting Optimally in Timed, Zero-Sum Games , 2007, AAAI.

[7]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[8]  Andrey Kolobov,et al.  Scalable Methods and Expressive Models for Planning Under Uncertainty , 2016 .

[9]  Peng Dai,et al.  Topological Value Iteration Algorithms , 2011, J. Artif. Intell. Res..

[10]  Bart Selman,et al.  Risk-Sensitive Policies for Sustainable Renewable Resource Allocation , 2011, IJCAI.

[11]  Shlomo Zilberstein,et al.  Decision-Theoretic Control of Planetary Rovers , 2001, Advances in Plan-Based Control of Robotic Agents.

[12]  Sven Koenig,et al.  Risk-Sensitive Planning with One-Switch Utility Functions: Value Iteration , 2005, AAAI.

[13]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[14]  E. Altman Constrained Markov Decision Processes , 1999 .

[15]  Erann Gat,et al.  An Autonomous Spacecraft Agent Prototype , 1998, Auton. Robots.

[16]  T. Tsiligirides,et al.  Heuristic Methods Applied to Orienteering , 1984 .

[17]  Stella X. Yu,et al.  Optimization Models for the First Arrival Target Distribution Function in Discrete Time , 1998 .

[18]  Blai Bonet,et al.  Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback , 2003, IJCAI.

[19]  Michel Gendreau,et al.  The orienteering problem with stochastic travel and service times , 2011, Ann. Oper. Res..

[20]  Sven Koenig,et al.  Functional Value Iteration for Decision-Theoretic Planning with General Utility Functions , 2006, AAAI.

[21]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[22]  Edmund H. Durfee,et al.  Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors , 2005, IJCAI.

[23]  Sven Koenig,et al.  An exact algorithm for solving MDPs under risk-sensitive planning objectives with one-switch utility functions , 2008, AAMAS.

[24]  Pradeep Varakantham,et al.  Optimization Approaches for Solving Chance Constrained Stochastic Orienteering Problems , 2013, ADT.

[25]  Alok Aggarwal,et al.  Cooperative Multiobjective Decision Support for the Paper Industry , 1999, Interfaces.

[26]  Louis Wehenkel,et al.  Risk-aware decision making and dynamic programming , 2008 .

[27]  Frederick Y. Wu,et al.  A decision-support system for quote generation , 2002, AAAI/IAAI.

[28]  Jim Blythe,et al.  Decision-Theoretic Planning , 1999, AI Mag..