Composing Functions to Speed up Reinforcement Learning in a Changing World

This paper presents a system that transfers the results of prior learning to speed up reinforcement learning in a changing world. Often, even when the change to the world is relatively small an extensive relearning effort is required. The new system exploits strong features in the multi-dimensional function produced by reinforcement learning. The features generate a partitioning of the state space. The partition is represented as a graph. This is used to index and compose functions stored in a case base to form a close approximation to the solution of the new task. The experimental results investigate one important example of a changing world, a new goal position. In this situation, there is close to a two orders of magnitude increase in learning rate over using a basic reinforcement learning algorithm.

[1]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[2]  Kristian J. Hammond,et al.  Case-Based Planning: A Framework for Planning from Experience , 1990, Cogn. Sci..

[3]  Kristian J. Hammond,et al.  Case-based planning: A framework for planning from experience ☆ , 1990 .

[4]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[5]  Alan D. Christiansen Learning to Predict in Uncertain Continuous Tasks , 1992, ML.

[6]  Pascal Fua,et al.  Computational strategies for object recognition , 1992, CSUR.

[7]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[8]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[9]  Laurent D. Cohen,et al.  Finite-Element Methods for Active Contour Models and Balloons for 2-D and 3-D Images , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[11]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[12]  Jing Peng,et al.  Efficient Memory-Based Dynamic Programming , 1995, ICML.

[13]  Prasad Tadepalli,et al.  Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function , 1996, ICML.

[14]  Chris Drummond Using a Case Base of Surfaces to Speed-Up Reinforcement Learning , 1997, ICCBR.

[15]  Doina Precup,et al.  Multi-time Models for Temporally Abstract Planning , 1997, NIPS.