Self-segmentation of sequences

The paper presents an approach for hierarchical reinforcement learning that does not rely on a priori hierarchical structures. Thus the approach deals with a more difficult problem compared with existing work. It involves learning to segment sequences to create hierarchical structures, based on reinforcement received during task execution, with different levels of control communicating with each other through sharing reinforcement estimates obtained by each others. The algorithm segments sequences to reduce non-Markovian temporal dependencies, to facilitate the learning of the overall task. Initial experiments demonstrated the basic promise of the approach.

[1]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[2]  Mark Humphrys,et al.  W-learning: A simple RL-based Society of Mind , 1995 .

[3]  Andrew McCallum,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[4]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[5]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[6]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[7]  Eva Figes Finding a Structure , 1982 .

[8]  Satinder Singh,et al.  Learning to Solve Markovian Decision Processes , 1993 .

[9]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[10]  Chen K. Tham,et al.  Reinforcement learning of multiple tasks using a hierarchical CMAC architecture , 1995, Robotics Auton. Syst..

[11]  Doina Precup,et al.  Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[12]  Jürgen Schmidhuber,et al.  Learning Unambiguous Reduced Sequence Descriptions , 1991, NIPS.

[13]  Giovanni Soda,et al.  Recurrent neural networks and prior knowledge for sequence processing: a constrained nondeterministic approach , 1995, Knowl. Based Syst..

[14]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[15]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[16]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[17]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[18]  Maja J. Matarić,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[19]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[20]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[21]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.