Classical, Probabilistic, and Contingent Planning: Three Models, One Algorithm

Various forms of planning in AI can be viewed as problems of sequential decision that diier only on whether the eeects of actions are predictable and/or observable. Three simple mathematical models, search, mdps and pomdps, provide the conceptual framework to understand such problems but suitable languages and algorithms are needed to model and solve them eeectively. In this paper, we analyze planning from this perspective and report the performance of a simple but general planning algorithm on various planning tasks. Actions We take an abstract view of planning as a problem of sequential decision in which actions have to be executed to achieve a goal. We distinguish two important aspects of actions: whether their eeects are predictable, and whether they are observable. We assume throughout that time is discrete. In deterministic action models, the eeect of actions is completely predictable and can be represented by transition function f such that s 0 = f(s; a) is the unique state that follows the state s after action a. Probabilistic action models, on the other hand, are represented by means of transition probabilities P a (s 0 js) whose sum over the possible successor states s 0 of s and a must add up to one. Observations provide feedback from the environment and determine the form of plans: in the absence of feedback, the plan is a sequence of actions; in the presence of feedback, the plan is a function of the observations. We will distinguish three cases; when the eeect of actions is fully observable, partially observable, or not observable at all. Plans We refer to planning in the presence of observations as closed-loop planning, and planning in the absence of observations as open-loop planning 23, 13, 1]. Classical planning is open-loop planning, while contingent and reactive planning are forms of closed-loop planning. The two forms of planning depend on assumptions about observability and not on whether actions are deterministic or probabilistic. Closed-loop planning is normally regarded as superior because it is more robust: it can recover from perturbations (e.g., a block falling oo the gripper) and errors in the initial conditions or action models (e.g., like assuming that actions are deterministic when they are not). Open-loop plans cannot recover. For this reason, the execution of open-loop plans is usually extended with mechanisms for plan switching, replanning, etc. Models Three standard mathematical models of sequential decisions allow us to make precise …

[1]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[2]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[3]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[4]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[5]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[6]  Michael P. Wellman,et al.  Planning and Control , 1991 .

[7]  Oren Etzioni,et al.  An Approach to Planning with Incomplete Information , 1992, KR.

[8]  Leslie Pack Kaelbling,et al.  Planning With Deadlines in Stochastic Domains , 1993, AAAI.

[9]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[10]  Daniel S. Weld An Introduction to Least Commitment Planning , 1994, AI Mag..

[11]  Nicholas Kushmerick,et al.  An Algorithm for Probabilistic Planning , 1995, Artif. Intell..

[12]  Gregg Collins,et al.  Planning Under Uncertainty: Some Key Issues , 1995, IJCAI.

[13]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[14]  Avrim Blum,et al.  Fast Planning Through Planning Graph Analysis , 1995, IJCAI.

[15]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[16]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[17]  Bart Selman,et al.  Pushing the Envelope: Planning, Propositional Logic and Stochastic Search , 1996, AAAI/IAAI, Vol. 2.

[18]  Hector J. Levesque,et al.  What Is Planning in the Presence of Sensing? , 1996, AAAI/IAAI, Vol. 2.

[19]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[20]  Drew McDermott,et al.  A Heuristic Estimator for Means-Ends Analysis in Planning , 1996, AIPS.

[21]  T. Dean,et al.  Planning under uncertainty: structural assumptions and computational leverage , 1996 .

[22]  Blai Bonet,et al.  A Robust and Fast Action Selection Mechanism for Planning , 1997, AAAI/IAAI.

[23]  Blai Bonet,et al.  Learning Sorting and Decision Trees with POMDPs , 1998, ICML.

[24]  Blai Bonet High-Level Planning and Control with Incomplete Information Using POMDP's , 1998 .

[25]  M. Kurano,et al.  Markov decision processes with fuzzy rewards (Perspective and problem for Dynamic Programming with uncertainty) , 2001 .