Automatic Derivation of Memoryless Policies and Finite-State Controllers Using Classical Planners

Finite-state and memoryless controllers are simple action selection mechanisms widely used in domains such as videogames and mobile robotics. Memoryless controllers stand for functions that map observations into actions, while finite-state controllers generalize memoryless ones with a finite amount of memory. In contrast to the policies obtained from MDPs and POMDPs, finite-state controllers have two advantages: they are often extremely compact, involving a small number of controller states or none at all, and they are general, applying to many problems and not just one. A limitation of finite-state controllers is that they must be written by hand. In this work, we address this limitation, and develop a method for deriving finite-state controllers automatically from models. These models represent a class of contingent problems where actions are deterministic and some fluents are observable. The problem of deriving a controller from such models is converted into a conformant planning problem that is solved using classical planners, taking advantage of a complete translation introduced recently. The controllers derived in this way are 'general' in the sense that they do not solve the original problem only, but many variations as well, including changes in the size of the problem or in the uncertainty of the initial situation and action effects. Experiments illustrating the derivation of such controllers are presented.

[1]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[2]  David Chapman,et al.  Penguins Can Make Cake , 1989, AI Mag..

[3]  Jonathan H. Connell,et al.  Minimalist mobile robotics - a colony-style architecture for an artificial creature , 1990, Perspectives in artificial intelligence.

[4]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[5]  Bart Selman,et al.  Pushing the Envelope: Planning, Propositional Logic and Stochastic Search , 1996, AAAI/IAAI, Vol. 2.

[6]  Hector J. Levesque,et al.  What Is Planning in the Presence of Sensing? , 1996, AAAI/IAAI, Vol. 2.

[7]  Rajesh P. N. Rao,et al.  Embodiment is the foundation, not a level , 1996, Behavioral and Brain Sciences.

[8]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[9]  Eric A. Hansen,et al.  Solving POMDPs by Searching in Policy Space , 1998, UAI.

[10]  Kee-Eung Kim,et al.  Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[11]  Robin R. Murphy,et al.  Introduction to AI Robotics , 2000 .

[12]  Bernhard Nebel,et al.  In Defense of PDDL Axioms , 2003, IJCAI.

[13]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[14]  Mat Buckland,et al.  Programming Game AI by Example , 2004 .

[15]  Aude Billard,et al.  From Animals to Animats , 2004 .

[16]  R. Brafman,et al.  Contingent Planning via Heuristic Forward Search witn Implicit Belief States , 2005, ICAPS.

[17]  Hector J. Levesque,et al.  Planning with Loops , 2005, IJCAI.

[18]  Hector J. Levesque,et al.  On the Limits of Planning over Belief States under Strict Uncertainty , 2006, KR.

[19]  Shlomo Zilberstein,et al.  Optimizing Memory-Bounded Controllers for Decentralized POMDPs , 2007, UAI.

[20]  Maja J. Mataric,et al.  The Robotics Primer , 2007 .

[21]  Jörg Hoffmann,et al.  SAT Encodings of State-Space Reachability Problems in Numeric Domains , 2007, IJCAI.

[22]  Hector Geffner,et al.  From Conformant into Classical Planning: Efficient Translations that May Be Complete Too , 2007, ICAPS.

[23]  Malte Helmert,et al.  Landmarks Revisited , 2008, AAAI.