Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors

The paper presents an approach for hierarchical reinforcement learning that does not rely on a priori domain-specific knowledge regarding hierarchical structures. Thus, this work deals with a more difficult problem compared with existing work, It involves learning to segment action sequences to create hierarchical structures (for example, for the purpose of dealing with partially observable Markov decision processes, with multiple limited-memory or memoryless modules). Segmentation is based on reinforcement received during task execution, with different levels of control communicating with each other through sharing reinforcement estimates obtained by each other. The algorithm segments action sequences to reduce non-Markovian temporal dependencies, and seeks out proper configurations of long- and short-range dependencies, to facilitate the learning of the overall task. Developing hierarchies also facilitates the extraction of explicit hierarchical plans. The initial experiments demonstrate the promise of the approach.

[1]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[2]  Mark Humphrys,et al.  W-learning: A simple RL-based Society of Mind , 1995 .

[3]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[4]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[5]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[6]  Giovanni Soda,et al.  Recurrent neural networks and prior knowledge for sequence processing: a constrained nondeterministic approach , 1995, Knowl. Based Syst..

[7]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[8]  Masayuki Inaba,et al.  Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[9]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[10]  Jürgen Schmidhuber,et al.  Learning Unambiguous Reduced Sequence Descriptions , 1991, NIPS.

[11]  Long Ji Lin,et al.  Reinforcement Learning of Non-Markov Decision Processes , 1995, Artif. Intell..

[12]  Asim Roy,et al.  A neural-network learning theory and a polynomial time RBF algorithm , 1997, IEEE Trans. Neural Networks.

[13]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[14]  Earl D. Sacerdott Planning in a hierarchy of abstraction spaces , 1973, IJCAI 1973.

[15]  Stuart J. Russell,et al.  Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.

[16]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[17]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[18]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[19]  Doina Precup,et al.  Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[20]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[21]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[22]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[23]  Satinder Singh,et al.  Learning to Solve Markovian Decision Processes , 1993 .

[24]  Mark B. Ring Incremental Development of Complex Behaviors , 1991, ML.

[25]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[26]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[27]  Justinian P. Rosca,et al.  Evolution-Based Discovery of Hierarchical Behaviors , 1996, AAAI/IAAI, Vol. 1.

[28]  Maja J. Matarić,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[29]  Qiang Yang,et al.  Characterizing Abstraction Hierarchies for Planning , 1991, AAAI.

[30]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[31]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[32]  Gerhard Weiß,et al.  Distributed reinforcement learning , 1995, Robotics Auton. Syst..

[33]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[34]  Thomas G. Dietterich,et al.  Hierarchical Explanation-Based Reinforcement Learning , 1997, ICML.

[35]  C. Lee Giles,et al.  Learning a class of large finite state machines with a recurrent neural network , 1995, Neural Networks.

[36]  Ron Sun,et al.  Learning Plans without a priori Knowledge , 2000, Adapt. Behav..

[37]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[38]  Chen K. Tham,et al.  Reinforcement learning of multiple tasks using a hierarchical CMAC architecture , 1995, Robotics Auton. Syst..

[39]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[40]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[41]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[42]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[43]  Barbara Hayes-Roth,et al.  Plans should abstractly describe intended behavior , 1996 .

[44]  Ron Sun,et al.  Multi-agent reinforcement learning: weighting and partitioning , 1999, Neural Networks.

[45]  Qiang Yang,et al.  Downward Refinement and the Efficiency of Hierarchical Problem Solving , 1994, Artif. Intell..