Risk-sensitive planning in partially observable environments

Partially Observable Markov Decision Process (POMDP) is a popular framework for planning under uncertainty in partially observable domains. Yet, the POMDP model is risk-neutral in that it assumes that the agent is maximizing the expected reward of its actions. In contrast, in domains like financial planning, it is often required that the agent decisions are risk-sensitive (maximize the utility of agent actions, for non-linear utility functions). Unfortunately, existing POMDP solvers cannot solve such planning problems exactly. By considering piecewise linear approximations of utility functions, this paper addresses this shortcoming in three contributions: (i) It defines the Risk-Sensitive POMDP model; (ii) It derives the fundamental properties of the underlying value functions and provides a functional value iteration technique to compute them exactly and (c) It proposes an efficient procedure to determine the dominated value functions, to speed up the algorithm. Our experiments show that the proposed approach is feasible and applicable to realistic financial planning domains.

[1]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[2]  Nicholas Roy,et al.  Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[3]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[4]  Sven Koenig,et al.  Functional Value Iteration for Decision-Theoretic Planning with General Utility Functions , 2006, AAAI.

[5]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[6]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[7]  Philippe Artzner,et al.  Coherent Measures of Risk , 1999 .

[8]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[9]  Shlomo Zilberstein,et al.  Region-Based Incremental Pruning for POMDPs , 2004, UAI.

[10]  J. Neumann,et al.  Theory of Games and Economic Behavior. , 1945 .

[11]  David E. Smith,et al.  Planning Under Continuous Time and Resource Uncertainty: A Challenge for AI , 2002, AIPS Workshop on Planning for Temporal Domains.

[12]  Sven Koenig,et al.  An exact algorithm for solving MDPs under risk-sensitive planning objectives with one-switch utility functions , 2008, AAMAS.

[13]  Reid G. Simmons,et al.  Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[14]  D. E. Bell One-switch utility functions and a measure of risk , 1988 .

[15]  Louis Wehenkel,et al.  Risk-aware decision making and dynamic programming , 2008 .

[16]  Pascal Poupart,et al.  Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..

[17]  Nicholas R. Jennings,et al.  SouthamptonTAC: An adaptive autonomous trading agent , 2003, TOIT.

[18]  Milind Tambe,et al.  Towards Efficient Computation of Error Bounded Solutions in POMDPs: Expected Value Approximation and Dynamic Disjunctive Beliefs , 2007, IJCAI.

[19]  Leslie Pack Kaelbling,et al.  Continuous-State POMDPs with Hybrid Dynamics , 2008, ISAIM.