Relational Representations and Traces for Efficient Reinforcement Learning

This chapter introduces an approach for reinforcement learning based on a relational representation that: (i) can be applied over large search spaces, (ii) can incorporate domain knowledge, and (iii) can use previously learned policies on different, but similar, problems. The underlying idea is to represent states as sets of first order relations, actions in terms of those relations, and to learn policies over such generalized representation. It is shown how this representation can produce powerful abstractions and that policies learned over this generalized representation can be directly applied, without any further learning, to other problems that can be characterized by the same set of relations. To accelerate the learning process, we present an extension where traces of the tasks to be learned are provided by the user. These traces are used to select only a small subset of possible actions increasing the convergence of the learning algorithms. The effectiveness of the approach is tested on a flight simulator and on a mobile robot. DOI: 10.4018/978-1-60960-165-2.ch009

[1]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[2]  Donald Michie,et al.  Cognitive models from subcognitive skills , 1990 .

[3]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[4]  Hyongsuk Kim,et al.  CMAC-based adaptive critic self-learning control , 1991, IEEE Trans. Neural Networks.

[5]  Claude Sammut,et al.  Learning to Fly , 1992, ML.

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  Luc De Raedt,et al.  Top-down induction of logical decision trees , 1997 .

[8]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[9]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[10]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[11]  Lise Getoor,et al.  Learning Probabilistic Relational Models with Structural Uncertainty , 2000 .

[12]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[13]  K. Kersting,et al.  Interpreting Bayesian Logic Programs , 2000 .

[14]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[15]  Kurt Driessens,et al.  Speeding Up Relational Reinforcement Learning through the Use of an Incremental First Order Decision Tree Learner , 2001, ECML.

[16]  Luis Enrique Sucar,et al.  An exploration and navigation approach for indoor mobile robots considering sensor's perceptual limitations , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[17]  Saso Dzeroski,et al.  Integrating Experimentation and Guidance in Relational Reinforcement Learning , 2002, ICML.

[18]  Robert Givan,et al.  Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[19]  Richard T. Vaughan,et al.  On device abstractions for portable, reusable robot code , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[20]  Kurt Driessens,et al.  Relational Instance Based Regression for Relational Reinforcement Learning , 2003, ICML.

[21]  M. van Otterlo Efficient Reinforcement Learning using Relational Aggregation , 2003 .

[22]  Luc De Raedt,et al.  Logical Markov Decision Programs , 2003 .

[23]  K. Kersting,et al.  Logical Markov Decision Programs and the Convergence of Logical TD(lambda) , 2004, ILP.

[24]  M. van Otterlo Reinforcement Learning for Relational MDPs , 2004 .

[25]  Luc De Raedt,et al.  Relational Reinforcement Learning , 2001, Machine Learning.

[26]  Thomas G. Dietterich,et al.  Learning first-order probabilistic models with combining rules , 2005, Annals of Mathematics and Artificial Intelligence.

[27]  E.F. Morales,et al.  Global Localization of Mobile Robots for Indoor Environments Using Natural Landmarks , 2006, 2006 IEEE Conference on Robotics, Automation and Mechatronics.

[28]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[29]  Maurice Bruynooghe,et al.  Learning Relational Options for Inductive Transfer in Relational Reinforcement Learning , 2007, ILP.

[30]  Luc De Raedt,et al.  ProbLog: A Probabilistic Prolog and its Application in Link Discovery , 2007, IJCAI.

[31]  Kristian Kersting,et al.  Non-parametric policy gradients: a unified treatment of propositional and relational domains , 2008, ICML '08.