论文信息 - Determining a Proper Initial Configuration of Red-Black Planning by Machine Learning

Determining a Proper Initial Configuration of Red-Black Planning by Machine Learning

Planning deals with finding a sequence of actions that transforms the world from a given initial state to a state that satisfies a certain goal condition [8]. For the purposes of this paper we can define a planning problem simply as a state-transition system where states are the world states and transitions correspond to application of actions. States are defined by values of state variables. Let X be the set of state variables, each variable xi has a finite domain Di of its possible values. Then state s is a mapping from X to ⋃ i Di. s : X 7→ ⋃ i Di, where ∀i, s(xi) ∈ Di. The state space has a form of the Cartesian product of variables’ domains. Space = ∏ Di Every state s ∈ Space has assigned a (possibly empty) set of its successor states designed succ(s), every t ∈ succ(s) is labeled by the action that transforms s to t (i.e. performing actions changes values of state variables). The task is to find a path p in this state-transition system that leads from a given initial state to some state satisfying a goal condition (a goal state). p = {s0, s1, . . . , sn}, where s0 is the initial state, sn is some goal state and ∀0 ≤ i < n : si+1 ∈ succ(si). Such a path is called a solution plan. The goal is to reach a state where some variables have specified values. One of the most promising approaches to solve the planning problem (based on the results of several International Planning Competitions [2]) is heuristic-guided forward search. (Mostly in a form of A∗ or a hill-climbing). These approaches make use of a heuristic estimation during search and the accuracy of the heuristic estimator has a great impact on the performance. Hence designing a powerful and easy-to-compute heuristic is of paramount importance. Heuristics are usually based on relaxations of the problem. When estimating the quality of the best solution, we relax the problem by ignoring some constraints (making the problem easier), then solve the relaxed problem and use the quality of that solution as a lower bound on the quality of the best solution to the original problem. In planning, this principle is represented by the well known delete relaxation heuristic and its variants [8, 3, 4]. Heuristics based on this principle often work well, but in some situations they greatly underestimate the real value making them inaccurate (see [6] for example). Delete relaxation allows the state variables to hold several values simultaneously, so the relaxed state subsumes several ordinary states. Furthermore, performing actions (i.e. making transitions) only adds new elements to the set of values that each variable currently holds (never removes any value). Hence the set of ordinary states that the relaxed state subsumes monotonically increases on every path. A path is a relaxed solution plan if it leads to a relaxed state which

Roman Barták | Otakar Trunda

[1] Jörg Hoffmann. Where Ignoring Delete Lists Works, Part II: Causal Graphs , 2011, ICAPS.

[2] Paolo Traverso,et al. Automated Planning: Theory & Practice , 2004 .

[3] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[4] Jörg Hoffmann,et al. Red-Black Relaxed Plan Heuristics Reloaded , 2013, SOCS.

[5] J. Hoffmann,et al. Where 'Ignoring Delete Lists' Works: Local Search Topology in Planning Benchmarks , 2005, J. Artif. Intell. Res..

[6] Carmel Domshlak,et al. Who Said We Need to Relax All Variables? , 2013, ICAPS.

[7] Carmel Domshlak,et al. Red-Black Relaxed Plan Heuristics , 2013, AAAI.