Learning MDP Action Models Via Discrete Mixture Trees

This paper addresses the problem of learning dynamic Bayesian network (DBN) models to support reinforcement learning. It focuses on learning regression tree (context-specific dependence) models of the conditional probability distributions of the DBNs. Existing algorithms rely on standard regression tree learning methods (both propositional and relational). However, such methods presume that the stochasticity in the domain can be modeled as a deterministic function with additive noise. This is inappropriate for many RL domains, where the stochasticity takes the form of stochastic choice over deterministic functions. This paper introduces a regression tree algorithm in which each leaf node is modeled as a finite mixture of deterministic functions. This mixture is approximated via a greedy set cover. Experiments on three challenging RL domains show that this approach finds trees that are more accurate and that are more likely to correctly identify the conditional dependencies in the DBNs based on small samples.

[1]  Thomas G. Dietterich,et al.  Automatic discovery and transfer of MAXQ hierarchies , 2008, ICML '08.

[2]  Prasad Tadepalli,et al.  Automatic Induction of MAXQ Hierarchies , 2007 .

[3]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[4]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[5]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[6]  Celine Vens,et al.  ReMauve: A Relational Model Tree Learner , 2006, ILP.

[7]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[8]  David Maxwell Chickering,et al.  A Bayesian Approach to Learning Bayesian Networks with Local Structure , 1997, UAI.

[9]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[10]  A. Gupta,et al.  A Bayesian Approach to , 1997 .

[11]  Rafael Molina,et al.  On the Bayesian Approach to Learning , 1992 .

[12]  Y. Bar-Shalom Stochastic dynamic programming: Caution and probing , 1981 .

[13]  Peter Slavík A Tight Analysis of the Greedy Algorithm for Set Cover , 1997, J. Algorithms.

[14]  Andrew G. Barto,et al.  Causal Graph Based Decomposition of Factored MDPs , 2006, J. Mach. Learn. Res..

[15]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[16]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[17]  Luís Torgo,et al.  Functional Models for Regression Tree Leaves , 1997, ICML.

[18]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[19]  João Gama,et al.  Functional Trees , 2001, Machine Learning.

[20]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[21]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[22]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[23]  Stefan Kramer,et al.  Structural Regression Trees , 1996, AAAI/IAAI, Vol. 1.

[24]  Petr Slavík,et al.  A tight analysis of the greedy algorithm for set cover , 1996, STOC '96.

[25]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[26]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..