论文信息 - Exploration in relational domains for model-based reinforcement learning

Exploration in relational domains for model-based reinforcement learning

A fundamental problem in reinforcement learning is balancing exploration and exploitation. We address this problem in the context of model-based reinforcement learning in large stochastic relational domains by developing relational extensions of the concepts of the E3 and R-MAX algorithms. Efficient exploration in exponentially large state spaces needs to exploit the generalization of the learned model: what in a propositional setting would be considered a novel situation and worth exploration may in the relational setting be a well-known context in which exploitation is promising. To address this we introduce relational count functions which generalize the classical notion of state and action visitation counts. We provide guarantees on the exploration efficiency of our framework using count functions under the assumption that we had a relational KWIK learner and a near-optimal planner. We propose a concrete exploration algorithm which integrates a practically efficient probabilistic rule learner and a relational planner (for which there are no guarantees, however) and employs the contexts of learned relational rules as features to model the novelty of states and actions. Our results in noisy 3D simulated robot manipulation problems and in domains of the international planning competition demonstrate that our approach is more effective than existing propositional and factored exploration techniques.

[1] S. Krishnan,et al. A Probabilistic Training Scheme for the Time-Concentration Network , 1989, KBCS.

[2] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[3] Sebastian Thrun,et al. The role of exploration in learning control , 1992 .

[4] D. Sofge. THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .

[5] David A. Cohn,et al. Active Learning with Statistical Models , 1996, NIPS.

[6] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[7] Stephen Muggleton,et al. Learning from Positive Data , 1996, Inductive Logic Programming Workshop.

[8] Shan-Hwei Nienhuys-Cheng,et al. Foundations of Inductive Logic Programming , 1997, Lecture Notes in Computer Science.

[9] Stephen Muggleton. Inductive Logic Programming: 6th International Workshop, ILP-96, Stockholm, Sweden, August 26-28, 1996, Selected Papers , 1997 .

[10] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[11] Hendrik Blockeel,et al. Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[12] Craig Boutilier,et al. Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[13] Jan Ramon. Thesis: clustering and instance based learning in first order logic , 2002 .

[14] Jan Ramon,et al. Clustering and instance based learning in first order logic , 2002, AI Communications.