6th International Planning Competition: Uncertainty Part

The 6 International Planning Competition will be colocated with ICAPS-08 in Sydney, Australia. This competition will contain three parts: i) the classical part, ii) the uncertainty part and iii) the learning part. This document presents the uncertainty part and its various tracks. The official site for the uncertainty part is: http://ippc-2008.loria.fr/wiki/, where this document and other materials are provided. Introduction The 6th International Planning Competition (IPC-6) will be co-located with the 18th International Conference on Automated Planning and Scheduling (ICAPS-08) in Sydney, Australia on September 14–18, 2008. The competition is a biannual event where a number of planning systems are evaluated on a variety of problems. This is an opportunity to compare existing algorithms and to provide the automated planning community with recognized benchmarks written in a standardized language: PDDL. The IPC started in 1998, taking place at AIPS until 2004, then at ICAPS. It was originally limited to classical planning, i.e. automated planning in deterministic domains. In 2004 has been introduced a “probabilistic track” — organized by Michael Littman and Haakan Younes— which introduced an extension of PDDL to probabilistic domains and a client-server plan evaluator (MDPSim). In 2006 this track —organized by Blai Bonet and Bob Givan— has been enriched with a conformant subtrack and renamed “nondeterministic track”. This year’s competition (2008) renames “tracks” as “parts” and includes three of them: • the classical part —organized by Minh Do, Malte Helmert and Ioannis Refanidis1—, • the uncertainty part2 —organized by ourselves—, and • a novel learning part —organized by Alan Fern, Roni Khardon and Prasad Tadepalli3—, in which planners can http://ipc.informatik.uni-freiburg.de/ “Uncertainty” is preferred over “probabilistic” or “nondeterministic” to avoid possible confusions. http://eecs.oregonstate.edu/ipc-learn/ train on small instances of problems before being evaluated on large ones. This document is a modified version of the previous edition’s call for participations (Bonet & Givan 2005). It presents the uncertainty part, which is made of • a non-observable non-deterministic (conformant) track, • a fully-observable non-deterministic track, • a fully-observable probabilistic track and • a partially-observable probabilistic track. Planning Tasks and Solutions Most of the competition will focus on shortest-path planning problems. This is more general than goal reachability with unit costs as actions may have a non unit cost. Problems of this type can be described by models of the form: M1. a finite state space (set of states) S, M2. an initial state s0 ∈ S, M3. a set SG ⊆ S of goal states, M4. sets A(s) of applicable actions for each s ∈ S, M5. a cost function c(s, a)→ R, and M6. a non-deterministic transition function F (s, a) ⊆ S. Models of M1–M6 are described using a high-level planning language based on propositional logic in which the states are valuations for the propositional symbols, the set of initial and goal states are described by logical formulae, and the set of applicable actions (operators) and the transition function are described by means of action schemata. The form of a solution and the optimality criteria depend on the particular planning task as follows. Non-Observable Non-Deterministic Planning (NOND/Conformant Planning) The problem of conformant planning is that of deciding whether there exists a linear sequence of actions that will achieve the goal from any initial state and any resolution of the non-determinism in the problem (Goldman & Boddy 1996; Smith & Weld 1998). In non-observable or partially observable domains, belief states are used to represent ones belief of the possible current states. The belief state at time-step n depends on the initial belief state b0 and on the history of past observations (if any) and actions (hn = 〈a0, o0, . . . , an−1, on−1〉) since the initial state.4 In a non-deterministic setting, a non-deterministic5 belief state b is a set of states. Then, a conformant planning problem is modeled with M1–M6, where M2 and M5 are redefined as M2’. an initial belief state b0 ⊆ S, and M5’. a cost function c(s, a)→ 1. Given this model, we say that s0, a0, . . . , an−1, sn is a trajectory generated by actions a0, . . . , an−1 when C1. s0 ∈ b0, C2. ak ∈ A(sk) for 0 ≤ k < n, and C3. sk+1 ∈ F (sk, ak) for 0 ≤ k < n. The plan a0, . . . , an−1 is a (valid) solution to the model if each trajectory under a0, . . . , an−1 is such that sn ∈ SG. A valid plan π is assigned a (worst-case scenario) cost Vπ equal to the longest trajectory starting in some s0 ∈ b0 and ending at a goal state. The plan is optimal if its value is minimal. Fully-Observable Non-Deterministic Planning (FOND Planning) Non-deterministic planning with full observability refers to deciding whether there exists a conditional plan that achieves the goal for a model satisfying M1–M6, where M5 is redefined as M5’. a cost function c(s, a)→ 1. The main difference from conformant planning is that solutions are policies (partial functions) mapping states into actions, rather than linear sequences of operators. Let π : S → ⋃ s∈S A(s) be a policy for model M1–M6, Sπ the domain of definition of π, and Sπ(s) the set of states reachable from s using π, then we say that: a) π is closed with respect to s iff Sπ(s) ⊆ Sπ , b) π is proper with respect to s iff a goal state can be reached using π from all s ∈ Sπ(s), c) π is acyclic with respect to s iff there is no trajectory s = s0, π(s0), . . . , si, . . . , sj , . . . , sn with i and j such that 0 ≤ i < j ≤ n, and si = sj . d) π is closed (resp. proper or acyclic) with respect to S′ ⊆ S if it is closed (resp. proper or acyclic) with respect to all s ∈ S′, A policy π is a valid solution for the non-deterministic model iff π is closed and proper with respect to the initial state s0 . A valid policy π is assigned a (worst-case scenario) cost Vπ equal to the longest trajectory starting at s0 and ending at a goal state. For acyclic policies with respect to s0, the cost Vπ is always well defined, i.e. < +∞. We make the assumption that there is no observation prior to the first action. “non-deterministic” is usually omitted when the context is clear. A policy π is optimal for model M1–M6 if it is a valid solution of minimum Vπ value. The competition will only judge the cost of plans in non-deterministic domains that admit acyclic solutions, where optimal solutions always have finite cost. In nondeterministic domains with cyclic solutions, the solutions will be judged solely by the time taken to generate a solution. Fully-Observable Probabilistic Planning (FOP Planning) Probabilistic planning problems —here stochastic shortestpath problems— can be described by models M1–M6 extended with M7. transition probabilities 0 < Pa(s|s), for s′ ∈ F (s, a) and a ∈ A(s), such that ∑ s′∈F (s,a) Pa(s ′|s) = 1. In this case, solutions are also policies π that map states into actions. As in the fully-observable non-deterministic case, definitions (a)–(d) can be used to characterize the properties of π. A policy π is a valid solution if it is closed and proper with respect to s0. The cost Vπ assigned to a valid π is defined as the expected cost incurred by the policy when it is applied from the initial states, i.e. Vπ is defined as Vπ(s0) where the function Vπ(·) is the unique solution to the Bellman equation giving Vπ(s) for states s 6∈ SG: