Search Control of Plan Generation in Decision-Theoretic Planners

This paper addresses the search control problem of selecting which plan to refine next for decision-theoretic planners, a choice point common to the decision theoretic planners created to date. Such planners can make use of a utility function to calculate bounds on the expected utility of an abstract plan. Three strategies for using these bounds to select the next plan to refine have been proposed in the literature. We examine the rationale for each strategy and prove that the optimistic strategy of always selecting a plan with the highest upper-bound on expected utility expands the fewest number of plans, when looking for all plans with the highest expected utility. When looking for a single plan with the highest expected utility, we prove that the optimistic strategy has the best possible worst case performance and that other strategies can fail to terminate. To demonstrate the effect of plan selection strategies on performance, we give results using the DRWS planner that show that the optimistic strategy can produce exponential improvements in time and space.