Solving Relational MDPs with First-Order Machine Learning

We present a new formulation of Relational Markov Decision Processes (RMDPs) which is simpler than the situationcalculus approach of Boutilier, Reiter and Price. In addition, we describe our initial efforts developing a novel, machinelearning based method for computing an RMDP’s policy. Our technique instantiates the RMDP into a number of propositional MDPs, which are then solved for their value functions. First-order regression techniques are then used to learn a value function for the complete RMDP. This value function may then be used to produce a policy for huge decisiontheoretic planning problems, outputting compact solutions without actually requiring explicit state space enumeration. Finally, we extend our RMDP formalism to cover the case of a dynamic universe, i.e. in which action effects may create new objects or destroy existing ones.

[1]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[2]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[3]  David E. Smith Controlling Backward Inference , 1989, Artif. Intell..

[4]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[5]  Brian Falkenhainer,et al.  Dynamic Constraint Satisfaction Problems , 1990, AAAI.

[6]  V. S. Subrahmanian,et al.  Probabilistic Logic Programming , 1992, Inf. Comput..

[7]  Mark A. Peot,et al.  Conditional nonlinear planning , 1992 .

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Robert P. Goldman,et al.  From knowledge bases to decision models , 1992, The Knowledge Engineering Review.

[10]  Oren Etzioni,et al.  An Approach to Planning with Incomplete Information , 1992, KR.

[11]  Robert P. Goldman,et al.  Conditional Linear Planning , 1994, AIPS.

[12]  Daniel S. Weld,et al.  Probabilistic Planning with Information Gathering and Contingent Execution , 1994, AIPS.

[13]  Oren Etzioni,et al.  Omnipotence without Omniscience: Sensor Management in Planning , 1994, AAAI 1994.

[14]  Daniel S. Weld An Introduction to Least Commitment Planning , 1994, AI Mag..

[15]  Peter Haddawy,et al.  Generating Bayesian Networks from Probablity Logic Knowledge Bases , 1994, UAI.

[16]  Drew McDermott,et al.  Modeling a Dynamic and Uncertain World I: Symbolic and Probabilistic Reasoning About Change , 1994, Artif. Intell..

[17]  Nicholas Kushmerick,et al.  An Algorithm for Probabilistic Planning , 1995, Artif. Intell..

[18]  Avrim Blum,et al.  Fast Planning Through Planning Graph Analysis , 1995, IJCAI.

[19]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[20]  S. Muggleton Stochastic Logic Programs , 1996 .

[21]  Mark A. Peot,et al.  Suspending Recursion in Causal-Link Planning , 1996, AIPS.

[22]  Stefan Kramer,et al.  Structural Regression Trees , 1996, AAAI/IAAI, Vol. 1.

[23]  Gregg Collins,et al.  Planning for Contingencies: A Decision-based Approach , 1996, J. Artif. Intell. Res..

[24]  David E. Smith,et al.  Conformant Graphplan , 1998, AAAI/IAAI.

[25]  David E. Smith,et al.  Extending Graphplan to handle uncertainty and sensing actions , 1998, AAAI 1998.

[26]  Jim Blythe,et al.  Planning Under Uncertainty in Dynamic Domains , 1998 .

[27]  Ronen I. Brafman,et al.  Structured Reachability Analysis for Markov Decision Processes , 1998, UAI.

[28]  Daphne Koller,et al.  Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[29]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[30]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[31]  John Langford,et al.  Probabilistic Planning in the Graphplan Framework , 1999, ECP.

[32]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[33]  Daphne Koller,et al.  Policy Iteration for Factored MDPs , 2000, UAI.

[34]  Marco Roveri,et al.  Conformant Planning via Symbolic Model Checking , 2000, J. Artif. Intell. Res..

[35]  Blai Bonet,et al.  Planning with Incomplete Information as Heuristic Search in Belief Space , 2000, AIPS.

[36]  Ben Taskar,et al.  Learning Probabilistic Models of Relational Structure , 2001, ICML.

[37]  Luc De Raedt,et al.  Adaptive Bayesian Logic Programs , 2001, ILP.

[38]  Carlos Guestrin,et al.  Max-norm Projections for Factored MDPs , 2001, IJCAI.

[39]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[40]  Luc De Raedt,et al.  Towards Combining Inductive Logic Programming with Bayesian Networks , 2001, ILP.

[41]  Pedro M. Domingos,et al.  Relational Markov models and their application to adaptive web navigation , 2002, KDD.

[42]  Piergiorgio Bertoli,et al.  Improving Heuristics for Planning as Search in Belief Space , 2002, AIPS.

[43]  Zhengzhu Feng,et al.  Symbolic heuristic search for factored Markov decision processes , 2002, AAAI/IAAI.

[44]  Robert Givan,et al.  Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[45]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[46]  Pedro M. Domingos,et al.  Dynamic Probabilistic Relational Models , 2003, IJCAI.

[47]  Ivan Bratko,et al.  First Order Regression , 1997, Machine Learning.