Generalizing plans to new environments in relational MDPs

A longstanding goal in planning research is the ability to generalize plans developed for some set of environments to a new but similar environment, with minimal or no replanning. Such generalization can both reduce planning time and allow us to tackle larger domains than the ones tractable for direct planning. In this paper, we present an approach to the generalization problem based on a new framework of relational Markov Decision Processes (RMDPs). An RMDP can model a set of similar environments by representing objects as instances of different classes. In order to generalize plans to multiple environments, we define an approximate value function specified in terms of classes of objects and, in a multiagent setting, by classes of agents. This class-based approximate value function is optimized relative to a sampled subset of environments, and computed using an efficient linear programming method. We prove that a polynomial number of sampled environments suffices to achieve performance close to the performance achievable when optimizing over the entire space. Our experimental results show that our method generalizes plans successfully to new, significantly larger, environments, with minimal loss of performance relative to environment-specific planning. We demonstrate our approach on a real strategic computer war game.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[3]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[4]  Herbert B. Enderton,et al.  A mathematical introduction to logic , 1972 .

[5]  John McCarthy,et al.  SOME PHILOSOPHICAL PROBLEMS FROM THE STANDPOINT OF ARTI CIAL INTELLIGENCE , 1987 .

[6]  Robert C. Moore Semantical Considerations on Nonmonotonic Logic , 1985, IJCAI.

[7]  P. Schweitzer,et al.  Generalized polynomial approximations in Markovian decision processes , 1985 .

[8]  Hector J. Levesque,et al.  All I Know: A Study in Autoepistemic Logic , 1990, Artif. Intell..

[9]  Raymond Reiter,et al.  The Frame Problem in the Situation Calculus: A Simple Solution (Sometimes) and a Completeness Result for Goal Regression , 1991, Artificial and Mathematical Theory of Computation.

[10]  Michael Gelfond,et al.  Representing Action and Change by Logic Programs , 1993, J. Log. Program..

[11]  Maurizio Lenzerini,et al.  PDL-based framework for reasoning about actions , 1995, AI*IA.

[12]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[13]  Hector J. Levesque,et al.  GOLOG: A Logic Programming Language for Dynamic Domains , 1997, J. Log. Program..

[14]  M. Shanahan Solving the frame problem , 1997 .

[15]  Fangzhen Lin,et al.  How to Progress a Database , 1997, Artif. Intell..

[16]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[17]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[18]  Avi Pfeffer,et al.  Probabilistic Frame-Based Systems , 1998, AAAI/IAAI.

[19]  Ronald Parr,et al.  Flexible Decomposition Algorithms for Weakly Coupled Markov Decision Problems , 1998, UAI.

[20]  Michael Thielscher,et al.  From Situation Calculus to Fluent Calculus: State Update Axioms as a Solution to the Inferential Frame Problem , 1999, Artif. Intell..

[21]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[22]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[23]  Dominique Longin,et al.  A Logic for Planning under Partial Observability , 2000, AAAI/IAAI.

[24]  Gerhard Lakemeyer,et al.  The logic of knowledge bases , 2000 .

[25]  Hector Geffner,et al.  Learning Generalized Policies in Planning Using Concept Languages , 2000, KR.

[26]  Maarten Marx,et al.  Situation Calculus as Hybrid Logic: First Steps , 2001, EPIA.

[27]  Dale Schuurmans,et al.  Direct value-approximation for factored MDPs , 2001, NIPS.

[28]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[29]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[30]  Shobha Venkataraman,et al.  Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.

[31]  Alex M. Andrew,et al.  Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems , 2002 .

[32]  Dov M. Gabbay,et al.  Handbook of Philosophical Logic , 2002 .

[33]  Robert Givan,et al.  Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[34]  Robert Demolombe,et al.  Belief Change: from Situation Calculus to Modal Logic , 2003, J. Appl. Non Class. Logics.

[35]  Andreas Herzig,et al.  Regression in Modal Logic , 2003, J. Appl. Non Class. Logics.

[36]  A. ADoefaa,et al.  ? ? ? ? f ? ? ? ? ? , 2003 .

[37]  Hector J. Levesque,et al.  Knowledge, action, and the frame problem , 2003, Artif. Intell..

[38]  Hector Geffner,et al.  Learning Generalized Policies from Planning Examples Using Concept Languages , 2004, Applied Intelligence.

[39]  Koen V. Hindriks,et al.  Agent Programming in 3APL , 1999, Autonomous Agents and Multi-Agent Systems.

[40]  Benjamin Van Roy,et al.  On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..

[41]  Gerhard Lakemeyer,et al.  Situations, Si! Situation Terms, No! , 2004, KR.

[42]  David Madigan,et al.  Probabilistic Temporal Reasoning , 2005, Handbook of Temporal Reasoning in Artificial Intelligence.

[43]  Gerhard Lakemeyer,et al.  Semantics for a useful fragment of the situation calculus , 2005, IJCAI.

[44]  Hector J. Levesque,et al.  Tractable Reasoning with Incomplete First-Order Knowledge in Dynamic Systems with Context-Dependent Actions , 2005, IJCAI.

[45]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.