Exploration in relational domains for model-based reinforcement learning

A fundamental problem in reinforcement learning is balancing exploration and exploitation. We address this problem in the context of model-based reinforcement learning in large stochastic relational domains by developing relational extensions of the concepts of the E3 and R-MAX algorithms. Efficient exploration in exponentially large state spaces needs to exploit the generalization of the learned model: what in a propositional setting would be considered a novel situation and worth exploration may in the relational setting be a well-known context in which exploitation is promising. To address this we introduce relational count functions which generalize the classical notion of state and action visitation counts. We provide guarantees on the exploration efficiency of our framework using count functions under the assumption that we had a relational KWIK learner and a near-optimal planner. We propose a concrete exploration algorithm which integrates a practically efficient probabilistic rule learner and a relational planner (for which there are no guarantees, however) and employs the contexts of learned relational rules as features to model the novelty of states and actions. Our results in noisy 3D simulated robot manipulation problems and in domains of the international planning competition demonstrate that our approach is more effective than existing propositional and factored exploration techniques.

[1]  S. Krishnan,et al.  A Probabilistic Training Scheme for the Time-Concentration Network , 1989, KBCS.

[2]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[3]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .

[4]  D. Sofge THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .

[5]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[6]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[7]  Stephen Muggleton,et al.  Learning from Positive Data , 1996, Inductive Logic Programming Workshop.

[8]  Shan-Hwei Nienhuys-Cheng,et al.  Foundations of Inductive Logic Programming , 1997, Lecture Notes in Computer Science.

[9]  Stephen Muggleton Inductive Logic Programming: 6th International Workshop, ILP-96, Stockholm, Sweden, August 26-28, 1996, Selected Papers , 1997 .

[10]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[11]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[12]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[13]  Jan Ramon Thesis: clustering and instance based learning in first order logic , 2002 .

[14]  Jan Ramon,et al.  Clustering and instance based learning in first order logic , 2002, AI Communications.

[15]  Dale Schuurmans,et al.  Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs , 2002, ICML.

[16]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[17]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[18]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[19]  John Langford,et al.  Exploration in Metric State Spaces , 2003, ICML.

[20]  Luc De Raedt,et al.  Bellman goes relational , 2004, ICML.

[21]  Saso Dzeroski,et al.  Integrating Guidance into Relational Reinforcement Learning , 2004, Machine Learning.

[22]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[23]  S. Sanner Simultaneous Learning of Structure and Value in Relational Reinforcement Learning , 2005 .

[24]  Steffen Hölldobler,et al.  FluCaP: A Heuristic Search Planner for First-Order MDPs , 2006, J. Artif. Intell. Res..

[25]  Thomas Gärtner,et al.  Graph kernels and Gaussian processes for relational reinforcement learning , 2006, Machine Learning.

[26]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[27]  Scott Sanner,et al.  Online Feature Discovery in Relational Reinforcement Learning , 2006 .

[28]  Roni Khardon,et al.  First Order Decision Diagrams for Relational MDPs , 2007, IJCAI.

[29]  Maurice Bruynooghe,et al.  Online Learning and Exploiting Relational Models in Reinforcement Learning , 2007, IJCAI.

[30]  Jennifer Neville,et al.  Relational Dependency Networks , 2007, J. Mach. Learn. Res..

[31]  Kurt Driessens,et al.  Transfer Learning in Reinforcement Learning Problems Through Partial Policy Recycling , 2007, ECML.

[32]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[33]  Michael L. Littman,et al.  Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.

[34]  Peter Geibel,et al.  Learning Models of Relational MDPs Using Graph Kernels , 2007, MICAI.

[35]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[36]  L. P. Kaelbling,et al.  Learning Symbolic Models of Stochastic Domains , 2007, J. Artif. Intell. Res..

[37]  Luc De Raedt,et al.  Probabilistic Inductive Logic Programming , 2004, Probabilistic Inductive Logic Programming.

[38]  Thomas J. Walsh,et al.  Knows what it knows: a framework for self-aware learning , 2008, ICML '08.

[39]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[40]  Gerald DeJong,et al.  Active reinforcement learning , 2008, ICML '08.

[41]  Kristian Kersting,et al.  Non-parametric policy gradients: a unified treatment of propositional and relational domains , 2008, ICML '08.

[42]  Thomas J. Walsh,et al.  Efficient Learning of Action Schemas and Web-Service Descriptions , 2008, AAAI.

[43]  Andrew Y. Ng,et al.  Near-Bayesian exploration in polynomial time , 2009, ICML '09.

[44]  Lihong Li,et al.  Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..

[45]  Lihong Li,et al.  The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.

[46]  Thomas J. Walsh,et al.  Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.

[47]  Michael L. Littman,et al.  A unifying framework for computational reinforcement learning theory , 2009 .

[48]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[49]  Scott Sanner,et al.  Practical solution techniques for first-order MDPs , 2009, Artif. Intell..

[50]  Marc Toussaint,et al.  Relevance Grounding for Planning in Relational Domains , 2009, ECML/PKDD.

[51]  Michael L. Littman,et al.  Dimension reduction and its application to model-based exploration in continuous spaces , 2010, Machine Learning.

[52]  Kristian Kersting,et al.  Self-Taught Decision Theoretic Planning with First Order Decision Diagrams , 2010, ICAPS.

[53]  Marc Toussaint,et al.  Integrated motor control, planning, grasping and high-level reasoning in a blocks world using probabilistic inference , 2010, 2010 IEEE International Conference on Robotics and Automation.

[54]  Lise Getoor,et al.  Active Learning for Networked Data , 2010, ICML.

[55]  Kristian Kersting,et al.  Boosting relational dependency networks , 2010, ILP - 2010.

[56]  Thomas J. Walsh,et al.  Efficient learning of relational models for sequential decision making , 2010 .

[57]  Thorsten Joachims,et al.  Fast Active Exploration for Link-Based Preference Learning Using Gaussian Processes , 2010, ECML/PKDD.

[58]  David Windridge,et al.  Perception-action learning as an epistemologically-consistent model for self-updating cognitive representation. , 2010, Advances in experimental medicine and biology.

[59]  Marc Toussaint,et al.  Planning with Noisy Probabilistic Relational Rules , 2010, J. Artif. Intell. Res..

[60]  Tobias Lang,et al.  Planning and exploration in stochastic relational worlds , 2011 .

[61]  De,et al.  Relational Reinforcement Learning , 2022 .