Inductive Policy Selection for First-Order MDPs

We select policies for large Markov Decision Processes (MDPs) with compact first-order representations. We find policies that generalize well as the number of objects in the domain grows, potentially without bound. Existing dynamic-programming approaches based on flat, propositional, or first-order representations either are impractical here or do not naturally scale as the number of objects grows without bound. We implement and evaluate an alternative approach that induces first-order policies using training data constructed by solving small problem instances using PGraphplan (Blurn & Langford, 1999). Our policies are represented as ensembles of decision lists, using a taxonomic concept language. This approach extends the work of Martin and Geffner (2000) to stochastic domains, ensemble learning, and a wider variety of problems. Empirically, we find "good" policies for several stochastic first-order MDPs that are beyond the scope of previous approaches. We also discuss the application of this work to the relational reinforcement-learning problem.

[1]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[2]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[3]  Allen Newell,et al.  Human Problem Solving. , 1973 .

[4]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[5]  David A. McAllester Observations on Cognitive Judgments , 1991, AAAI.

[6]  Robert Givan,et al.  Taxonomic syntax for first order inference , 1989, JACM.

[7]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[8]  Bart Selman,et al.  Near-Optimal Plans, Tractability, and Reactivity , 1994, KR.

[9]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10]  Leslie Pack Kaelbling,et al.  Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..

[11]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[12]  Luc De Raedt,et al.  Relational Reinforcement Learning , 1998, ILP.

[13]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[14]  John Langford,et al.  Probabilistic Planning in the Graphplan Framework , 1999, ECP.

[15]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[16]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[17]  Hector Geffner,et al.  Learning Generalized Policies in Planning Using Concept Languages , 2000, KR.

[18]  Carlos Guestrin,et al.  Max-norm Projections for Factored MDPs , 2001, IJCAI.

[19]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[20]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[21]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.