Towards Faster Planning with Continuous Resources in Stochastic Domains

Agents often have to construct plans that obey resource limits for continuous resources whose consumption can only be characterized by probability distributions. While Markov Decision Processes (MDPs) with a state space of continuous and discrete variables are popular for modeling these domains, current algorithms for such MDPs can exhibit poor performance with a scale-up in their state space. To remedy that we propose an algorithm called DPFP. DPFP's key contribution is its exploitation of the dual space cumulative distribution functions. This dual formulation is key to DPFP's novel combination of three features. First, it enables DPFP's membership in a class of algorithms that perform forward search in a large (possibly infinite) policy space. Second, it provides a new and efficient approach for varying the policy generation effort based on the likelihood of reaching different regions of the MDP state space. Third, it yields a bound on the error produced by such approximations. These three features conspire to allow DPFP's superior performance and systematic trade-off of optimality for speed. Our experimental evaluation shows that, when run stand-alone, DPFP outperforms other algorithms in terms of its any-time performance, whereas when run as a hybrid, it allows for a significant speedup of a leading continuous resource MDP solver.

[1]  Zhengzhu Feng,et al.  Dynamic Programming for Structured Continuous Markov Decision Problems , 2004, UAI.

[2]  Edmund H. Durfee,et al.  Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors , 2005, IJCAI.

[3]  Lihong Li,et al.  Lazy Approximation for Solving Continuous Finite-Horizon MDPs , 2005, AAAI.

[4]  David E. Smith,et al.  Planning Under Continuous Time and Resource Uncertainty: A Challenge for AI , 2002, AIPS Workshop on Planning for Temporal Domains.

[5]  Nicolas Meuleau,et al.  Scaling Up Decision Theoretic Planning to Planetary Rover Problems , 2004 .

[6]  E. Altman Constrained Markov Decision Processes , 1999 .

[7]  Milind Tambe,et al.  A Fast Analytical Algorithm for Solving Markov Decision Processes with Real-Valued Resources , 2007, IJCAI.

[8]  Ronen I. Brafman,et al.  Planning with Continuous Resources in Stochastic Domains , 2005, IJCAI.

[9]  Anne Lohrli Chapman and Hall , 1985 .

[10]  Daniel N. Nikovski,et al.  Non-Linear Stochastic Control in Continuous State Spaces by Exact Integration in Bellman's Equations , 2003 .

[11]  Makoto Yokoo,et al.  Winning back the CUP for distributed POMDPs: planning over continuous belief spaces , 2006, AAMAS '06.

[12]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[13]  Milos Hauskrecht,et al.  Linear Program Approximations for Factored Continuous-State Markov Decision Processes , 2003, NIPS.

[14]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[15]  Andrew Y. Ng,et al.  Policy Search via Density Estimation , 1999, NIPS.

[16]  Milos Hauskrecht,et al.  Solving Factored MDPs with Continuous and Discrete Variables , 2004, UAI.