Exploration in Relational Worlds

One of the key problems in model-based reinforcement learning is balancing exploration and exploitation. Another is learning and acting in large relational domains, in which there is a varying number of objects and relations between them. We provide one of the first solutions to exploring large relational Markov decision processes by developing relational extensions of the concepts of the Explicit Explore or Exploit (E3) algorithm. A key insight is that the inherent generalization of learnt knowledge in the relational representation has profound implications also on the exploration strategy: what in a propositional setting would be considered a novel situation and worth exploration may in the relational setting be an instance of a well-known context in which exploitation is promising. Our experimental evaluation shows the effectiveness and benefit of relational exploration over several propositional benchmark approaches on noisy 3D simulated robot manipulation problems.

[1]  Peter Geibel,et al.  Learning Models of Relational MDPs Using Graph Kernels , 2007, MICAI.

[2]  Thomas J. Walsh,et al.  Efficient learning of relational models for sequential decision making , 2010 .

[3]  Dale Schuurmans,et al.  Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs , 2002, ICML.

[4]  Maurice Bruynooghe,et al.  Online Learning and Exploiting Relational Models in Reinforcement Learning , 2007, IJCAI.

[5]  L. P. Kaelbling,et al.  Learning Symbolic Models of Stochastic Domains , 2007, J. Artif. Intell. Res..

[6]  Marc Toussaint,et al.  Approximate inference for planning in stochastic relational worlds , 2009, ICML '09.

[7]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[8]  Ben Taskar,et al.  Introduction to statistical relational learning , 2007 .

[9]  Marc Toussaint,et al.  Relevance Grounding for Planning in Relational Domains , 2009, ECML/PKDD.

[10]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[11]  Joost N. Kok Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, ECML.

[12]  Scott Sanner,et al.  Practical solution techniques for first-order MDPs , 2009, Artif. Intell..

[13]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[14]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[15]  Thomas Gärtner,et al.  Graph kernels and Gaussian processes for relational reinforcement learning , 2006, Machine Learning.

[16]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .

[17]  Kristian Kersting,et al.  Self-Taught Decision Theoretic Planning with First Order Decision Diagrams , 2010, ICAPS.

[18]  Kurt Driessens,et al.  Relational Reinforcement Learning , 1998, Machine-mediated learning.

[19]  Kurt Driessens,et al.  Transfer Learning in Reinforcement Learning Problems Through Partial Policy Recycling , 2007, ECML.

[20]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[21]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[22]  Kristian Kersting,et al.  Non-parametric policy gradients: a unified treatment of propositional and relational domains , 2008, ICML '08.

[23]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[24]  Saso Dzeroski,et al.  Integrating Guidance into Relational Reinforcement Learning , 2004, Machine Learning.

[25]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[26]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.