Model-Assisted Approaches for Relational Reinforcement Learning : Some challenges for the SRL community

For a relational reinforcement learning (RRL) agent, learning a model of the world can be very helpful. However, in many situations learning a perfect model is not possible. Therefore, only probabilistic methods capable of taking uncertainty into account can be used to exploit the collected knowledge. It is clear then that RRL offers an interesting testbed for statistical relational learning methods. In this paper, we describe an algorithm taking a middle ground between model-free and model-based (Relational) Reinforcement Learning. A model of the world dynamics in the form of a relational Dynamic Bayesian Network (DBN) is learned incrementally. Empirical results show that sampling the partially learned model outperforms traditional RRL Q-learners. We also focus on a number of open problems. First, it is clear that other SRL techniques, besides the one we are using, could be used just as well. It might be interesting to see what their strengths and weaknesses are in the specific RRL context. In addition, it is typical for our approach that chunks of partial knowledge are obtained, and little is known about how to combine, evaluate and exploit this partial knowledge more efficiently.

[1]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[2]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[3]  C. Atkeson,et al.  Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .

[4]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[5]  Andrew W. Moore,et al.  Applying Online Search Techniques to Continuous-State Reinforcement Learning , 1998, AAAI/IAAI.

[6]  Jonathan Baxter KnightCap : A chess program that learns by combining TD ( ) with game-tree search , 1998 .

[7]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[8]  Kurt Driessens,et al.  Speeding Up Relational Reinforcement Learning through the Use of an Incremental First Order Decision Tree Learner , 2001, ECML.

[9]  Robert Givan,et al.  Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[10]  David Poole,et al.  First-order probabilistic inference , 2003, IJCAI.

[11]  Daniel S. Weld Solving Relational MDPs with First-Order Machine Learning , 2004 .

[12]  K. Kersting,et al.  Logical Markov Decision Programs and the Convergence of Logical TD(lambda) , 2004, ILP.

[13]  Leslie Pack Kaelbling,et al.  Learning Probabilistic Relational Planning Rules , 2004, ICAPS.

[14]  Luc De Raedt,et al.  Bellman goes relational , 2004, ICML.

[15]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[16]  Maurice Bruynooghe,et al.  Towards Informed Reinforcement Learning , 2004, ICML 2004.

[17]  Martijn van Otterlo,et al.  A survey of reinforcement learning in relational domains , 2005 .

[18]  Maurice Bruynooghe,et al.  A Comparison of Approaches for Learning Probability Trees , 2005, ECML.

[19]  S. Sanner Simultaneous Learning of Structure and Value in Relational Reinforcement Learning , 2005 .

[20]  Leslie Pack Kaelbling,et al.  Learning Planning Rules in Noisy Stochastic Worlds , 2005, AAAI.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Pedro M. Domingos,et al.  Relational Dynamic Bayesian Networks , 2005, J. Artif. Intell. Res..

[23]  Kurt Driessens,et al.  Relational Reinforcement Learning , 1998, Machine-mediated learning.

[24]  K. Tuyls,et al.  Multi-Agent Relational Reinforcement Learning Explorations in Multi-State Coordination Tasks , 2006 .

[25]  R. Dearden Structured Prioritized Sweeping , 2022 .