论文信息 - Wizard : Compiled Macro-Actions for Planner-Domain Pairs

Wizard : Compiled Macro-Actions for Planner-Domain Pairs

This paper describes Wizard, a generalised macro-learning method that participated in the Learning Track of the 6th International Planning Competition. Given a planner, a domain, and a few example problems, Wizard suggests macros that might help the planner solve future problems in the domain faster. This implementation compiles macros into regular actions in the STRIPS and FLUENTS subsets of the PDDL. Introduction Wizard is an automated method that suggests macros for a given planner-domain pair. Using a few example problems, it learns macros that might help the planner solve future problems in the domain faster. The followings are the features of the implementation reported in this paper. • Wizard deals only with acquisition of macros; for their representation and exploitation during planning, it relies on the support available. Currently PDDL does not support macros, nor do planners reason about them. Wizard therefore compiles macros into actions and adds them into the domain. However, compilation of macros into regular actions is possible only with the propositional and numerical constructs of PDDL. • Wizard does not explicitly discover or exploit any specific planner or domain properties. It works with arbitrary planners and domains, where arbitrariness is in the characteristics borne or exhibited. Note, most existing macrolearning methods rely on specific characteristics; for example, MARVIN (Coles and Smith 2007) depends on plateaus in its heuristic profile and symmetries in domains while Macro-FF (Botea et al. 2005) assumes component level abstractions in domains and certain causal links in plans. Nevertheless, Wizard is suitable when improving performance is the objective but any specific characteristics are not known or are of no concern. • Wizard learns individual macros by exploring the entire macro space (restricted by given limits on macro-length and parameter-count). Thus, it learns any types of macros. This means Wizard can learn macros that are not observable from the given examples, that are not learnt by any existing methods, and that have action orderings not explored by the planner. Note, most existing macro-learning methods trigger their macro-generation procedures at certain specific events and learn only macros that are observable from given examples. • Wizard explores the macro-set space (restricted by a setsize limit) to learn collections of macros that maximise the performance by interacting among themselves. A collection of only individually top performing macros may not collaborate well among themselves. Also, macros in a top performing macro-set may not be individually top performing. Note, unlike Wizard, most existing macrolearning methods do not take these into account and suggest only arbitrarily chosen very small (e.g. 2) numbers of top performing macros. • Wizard adopts an evolutionary method to explore the macro and macro-set spaces. It generates macros using actions lifted from generalised plans of small example problems. To evaluate them, it employs a sophisticated procedure that solves other large example problems with and without macros and then measures the weighted time gains. For macro-set generation, Wizard learns individual macros first and then uses them as constituent macros; the macro-set evaluation procedure however remains the same as is used in macro evaluation. • Wizard does not learn macros that comprise any loopingstructures (e.g. execute move action while certain condition holds). It has no mechanism to infer loops from an action sequence. Thus any repetition of actions remains only as a static action-sequence. To the best of our knowledge, no macro-learning method in the literature learns looping-structures. No PDDL-based non-learning planner reasons about them either. This paper from now on describes Wizard’s design and implementation. It also discusses where to expect Wizard to be successful and where to not. Search Algorithm Figure 1 shows Wizard’s macro and/or macro-set exploration method, which is based on an evolutionary algorithm. Evolutionary algorithms repeatedly (for a number of epochs) generate new individuals (macros or macro-sets in this case) from current individuals by using genetic operators; only the best individuals (evaluated by fitness values) however survive through successive epochs. Genetic operators provide search diversity by exploring other possible individuals in the neighbourhood of the current individuals while evaluation methods provide converging search guidance by keeping only the best individuals; maintaining a balance between them is therefore crucially important. 1. Initialise the population and evaluate each individual to assign a numerical rating. 2. Repeat the following steps for a given number of epochs. (a) Repeat the following steps for a number equal to the population size. i. Generate an individual using randomly selected operators and operands, and exit if a new individual is not found in a reasonable number of attempts. ii. Evaluate the generated individual and assign a numerical rating. (b) Replace inferior current individuals by superior new individuals and exit if replacement is not satisfactory. (c) Exit if generation of a new individual failed. 3. Suggest the best individuals as the output of the algorithm. Figure 1: Wizard’s evolutionary search algorithm taking individuals either as macros or as macro-sets. Wizard explores the macro-space first and then using the learnt macros, it builds the macro-set space. The macrospace is restricted by limits on action-count and parametercount. Similarly, the macro-set space is restricted by a limit on the set-size. Both the search spaces still remain huge as macros having any numbers of actions and macro-sets having any numbers of macros are to be explored. This means any brute force or systematic but exhaustive search methods are not very suitable. Wizard therefore adopts an evolutionary approach to obtain a motivating search guidance. Macro Generation Wizard represents macros both as generalised action sequences and as resultant actions having parameters, preconditions and effects (see Figure 2). While the action sequences are used for macro generation, the resultant actions, when added to the domains, facilitate macro exploitation during planning (note, non-learning planners support only actions). Genetic operators produce new action sequences from operand macros’ constituent actions. The new sequences are then compiled into resultant actions by the wellknown regression-based action composition1. Wizard first solves a number of small2 seeding problems by the planner. It then generalises the plans (see Figure 2) replacing objects in the problems (e.g. bs) by variables having identical names (e.g. ?bs); however, the constants in the domain (e.g. in, out, left, and right) remain unchanged as they normally have designated specific roles in the domain dynamics (not in Figure 2 strictly). Wizard then uses the generalised actions in building macros. This has an advantage that macros occurring in plans serve as a baseline and then trying their neighbourhoods makes the randomness of the search process somewhat guided. Further, parameters in actions lifted from generalised plans can be easily unified by matching their names. Furthermore, many domain specific issues are normally found resolved in plans. Note, domain actions if used directly (without any specific analysis) as constituent actions do not facilitate these. 1Action composition by regression is a binary, associative, and non-commutative operation on actions where the latter action’s precondition and effect are subject to the former action’s effect, and both actions’ parameters are unified appropriately. For further details, please see (Newton et al. 2007) 2By problem size or difficulty level we mean, the time required by the given planner to solve the problem with the original domain. Given a planner, a particular 10 blocks problem could be solved more quickly (so easier) than a particular 7 blocks problem Actual Plan Generalised Plan Macro & ResultantAction (pick b1 left in) (pick b2 right in) (move in out) (drop b1 left out) (drop b2 right out) (pick b3 left out) (move out in) (drop b3 left in) (pick ?b1 left in) (pick ?b2 right in) (move in out) (drop ?b1 left out) (drop ?b2 right out) (pick ?b3 left out) (move out in) (drop ?b3 left in) (pick ?b3 left out) (move out in) (drop ?b3 left in) action pick-move-drop parameter ?b3 precond (and . . . ) effect (and . . . ) Figure 2: Plan generalisation and Macro construction. Figure 3 shows the genetic operators used by Wizard in generating macros. The operators may not be minimal in any sense and mainly include various plausible local search neighbourhood functions. For each macro, the proposed operators ensure exploration of a large number of its neighbourhoods. Further motivations are as follows. Good/bad individuals normally remain in clusters. Discarding/adding/altering a good/bad component explores other individuals in the same cluster of an individual. Combining good/bad components of two individuals finds a third good/bad individual. Constructing individuals from scratch ensures diversity of the exploration. Each letter represents an action with its parameters; macros are action sequences Plans ABCDEFGHK | LMNPQ | RSTUVW | Plans of seeding probs Macros CDEFG (appears in 1st plan) | KQTV (random; an operand) Extend BCDEFG | CDEFGH | B precedes; H succeeds CDEFG in a plan Shrink CDEF | DEFG | Discard one action from either end of CDEFG Split CDE | FG | CD | EFG | Split CDEFG at a random position Lift MNP | STUV | Lift randomly but as appears exactly in a plan Annex PCDEFG | CDEFGP | Add P before or after CDEFG Inject CWDEFG | CDWEFG | CDEWFG | CDEFWG | Insert W Delete CEFG | CDFG | CDEG | Delete a middle action from CDEFG Alter VDEFG | CDVFG | CDEFV | Replace an action in CDEFG by V Concat CDEFGKQTV | KQTVCDEFG | Concat two macros either way Crossover CDETV | KQ

John Levine | Maria Fox | Derek Long | M. A. Hakim Newton

[1] John Levine,et al. Learning Macro-Actions for Arbitrary Planners and Domains , 2007, ICAPS.

[2] Jonathan Schaeffer,et al. Macro-FF: Improving AI Planning with Automatically Learned Macro-Operators , 2005, J. Artif. Intell. Res..

[3] Andrew Coles,et al. Marvin: A Heuristic Search Planner with Online Macro-Action Learning , 2011, J. Artif. Intell. Res..