Learning Measures of Progress for Planning Domains

We study an approach to learning heuristics for planning domains from example solutions. There has been little work on learning heuristics for the types of domains used in deterministic and stochastic planning competitions. Perhaps one reason for this is the challenge of providing a compact heuristic language that facilitates learning. Here we introduce a new representation for heuristics based on lists of set expressions described using taxonomic syntax. Next, we review the idea of a measure of progress (parmar 2002), which is any heuristic that is guaranteed to be improvable at every state. We take finding a measure of progress as our learning goal, and describe a simple learning algorithm for this purpose. We evaluate our approach across a range of deterministic and stochastic planning-competition domains. The results show that often greedily following the learned heuristic is highly effective. We also show our heuristic can be combined with learned rule-based policies, producing still stronger results.