Reinforcement Learning with Hierarchies of Machines

We present a new approach to reinforcement learning in which the policies considered by the learning process are constrained by hierarchies of partially specified machines. This allows for the use of prior knowledge to reduce the search space and provides a framework in which knowledge can be transferred across problems and in which component solutions can be recombined to solve larger and more complicated problems. Our approach can be seen as providing a link between reinforcement learning and "behavior-based" or "teleo-reactive" approaches to control. We present provably convergent algorithms for problem-solving and learning with hierarchical machines and demonstrate their effectiveness on a problem with several thousand states.

[1]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[2]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[3]  Austin Tate,et al.  O-Plan: The open Planning Architecture , 1991, Artif. Intell..

[4]  Jane Yung-jen Hsu,et al.  Synthesizing Efficient Agents from Partial Programs , 1991, ISMIS.

[5]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[6]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[7]  Satinder P. Singh,et al.  Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.

[8]  Nils J. Nilsson,et al.  Reacting, Planning, and Learning in an Autonomous Agent , 1996, Machine Intelligence 14.

[9]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[10]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[12]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[13]  Thomas Dean,et al.  Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[14]  Richard S. Sutton,et al.  Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[15]  Doina Precup,et al.  Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[16]  Shieu-Hong Lin,et al.  Exploiting structure for planning and control , 1997 .

[17]  Robert Givan,et al.  Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.