Exploiting First-Order Regression in Inductive Policy Selection

We consider the problem of computing optimal generalised policies for relational Markov decision processes. We describe an approach combining some of the benefits of purely inductive techniques with those of symbolic dynamic programming methods. The latter reason about the optimal value function using first-order decision-theoretic regression and formula rewriting, while the former, when provided with a suitable hypotheses language, are capable of generalising value functions or policies for small instances. Our idea is to use reasoning and in particular classical first-order regression to automatically generate a hypotheses language dedicated to the domain at hand, which is then used as input by an inductive solver. This approach avoids the more complex reasoning of symbolic dynamic programming while focusing the inductive solver's attention on concepts that are specifically relevant to the optimal value function for the domain considered.

[1]  John Langford,et al.  Probabilistic Planning in the Graphplan Framework , 1999, ECP.

[2]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[3]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[4]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[5]  Hector Geffner,et al.  Learning Generalized Policies in Planning Using Concept Languages , 2000, KR.

[6]  Raymond Reiter,et al.  Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems , 2001 .

[7]  Shlomo Zilberstein,et al.  LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..

[8]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[9]  John K. Slaney,et al.  Blocks World revisited , 2001, Artif. Intell..

[10]  Zhengzhu Feng,et al.  Symbolic LAO* Search for Factored Markov Decision Processes , 2002, AAAI 2002.

[11]  Zhengzhu Feng,et al.  Symbolic heuristic search for factored Markov decision processes , 2002, AAAI/IAAI.

[12]  Robert Givan,et al.  Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[13]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[14]  Robert Givan,et al.  Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[15]  John W. Lloyd,et al.  Symbolic Learning for Adaptive Agents , 2003 .

[16]  Blai Bonet,et al.  Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[17]  David Price,et al.  Implementation and Comparison of Solution Methods for Decision Processes with Non-Markovian Rewards , 2002, UAI.

[18]  Leslie Pack Kaelbling,et al.  Envelope-based Planning in Relational MDPs , 2003, NIPS.

[19]  John W. Lloyd Logic for learning - learning comprehensible theories from structured data , 2003, Cognitive Technologies.

[20]  J. W. Lloyd,et al.  Logic for Learning , 2003, Cognitive Technologies.

[21]  J. W. Lloyd Logic and Learning , 2003 .

[22]  William McCune,et al.  OTTER 3.3 Reference Manual , 2003, ArXiv.

[23]  Daniel S. Weld Solving Relational MDPs with First-Order Machine Learning , 2004 .

[24]  Håkan L. S. Younes,et al.  PPDDL 1 . 0 : An Extension to PDDL for Expressing Planning Domains with Probabilistic Effects , 2004 .

[25]  Robert Givan,et al.  Learning Domain-Specific Control Knowledge from Random Walks , 2004, ICAPS.

[26]  Paolo Traverso,et al.  Automated Planning: Theory & Practice , 2004 .

[27]  De,et al.  Relational Reinforcement Learning , 2001, Encyclopedia of Machine Learning and Data Mining.