Extending Classical Planning Heuristics to Probabilistic Planning with Dead-Ends

Recent domain-determinization techniques have been very successful in many probabilistic planning problems. We claim that traditional heuristic MDP algorithms have been unsuccessful due mostly to the lack of efficient heuristics in structured domains. Previous attempts like mGPT used classical planning heuristics to an all-outcome determinization of MDPs without discount factor ; yet, discounted optimization is required to solve problems with potential dead-ends. We propose a general extension of classical planning heuristics to goal-oriented discounted MDPs, in order to overcome this flaw. We apply our theoretical analysis to the well-known classical planning heuristics hmax and hadd, and prove that the extended hmax is admissible. We plugged our extended heuristics to popular graph-based (Improved-LAO*, LRTDP, LDFS) and ADD-based (sLAO*, sRTDP) MDP algorithms: experimental evaluations highlight competitive results compared with the winners of previous competitions (FF-REPLAN, FPG, RFF), and show that our discounted heuristics solve more problems than non-discounted ones, with better criteria values. As for classical planning, the extended hadd outperforms the extended hmax on most problems.

[1]  Håkan L. S. Younes,et al.  The First Probabilistic Track of the International Planning Competition , 2005, J. Artif. Intell. Res..

[2]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[3]  Shlomo Zilberstein,et al.  Symbolic Generalization for On-line Planning , 2002, UAI.

[4]  Shlomo Zilberstein,et al.  LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..

[5]  Blai Bonet,et al.  mGPT: A Probabilistic Planner Based on Heuristic Search , 2005, J. Artif. Intell. Res..

[6]  Zhengzhu Feng,et al.  Symbolic heuristic search for factored Markov decision processes , 2002, AAAI/IAAI.

[7]  Blai Bonet,et al.  Learning Depth-First Search: A Unified Approach to Heuristic Search in Deterministic and Non-Deterministic Settings, and Its Application to MDPs , 2006, ICAPS.

[8]  Wheeler Ruml,et al.  Improving Determinization in Hindsight for On-line Probabilistic Planning , 2010, ICAPS.

[9]  Blai Bonet,et al.  Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[10]  Olivier Buffet,et al.  FF + FPG: Guiding a Policy-Gradient Planner , 2007, ICAPS.

[11]  Blai Bonet,et al.  Planning as heuristic search , 2001, Artif. Intell..

[12]  Mausam,et al.  Classical Planning in MDP Heuristics: with a Little Help from Generalization , 2010, ICAPS.

[13]  Olivier Buffet,et al.  The factored policy-gradient planner , 2009, Artif. Intell..

[14]  Jörg Hoffmann,et al.  FF: The Fast-Forward Planning System , 2001, AI Mag..

[15]  Robert Givan,et al.  FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.

[16]  Ugur Kuter,et al.  Incremental plan aggregation for generating policies in MDPs , 2010, AAMAS.