A hierarchical approach to efficient reinforcement learning in deterministic domains

Factored representations, model-based learning, and hierarchies are well-studied techniques for improving the learning efficiency of reinforcement-learning algorithms in large-scale state spaces. We bring these three ideas together in a new algorithm. Our algorithm tackles two open problems from the reinforcement-learning literature, and provides a solution to those problems in deterministic domains. First, it shows how models can improve learning speed in the hierarchy-based MaxQ framework without disrupting opportunities for state abstraction. Second, we show how hierarchies can augment existing factored exploration algorithms to achieve not only low sample complexity for learning, but provably efficient planning as well. We illustrate the resulting performance gains in example domains. We prove polynomial bounds on the computational effort needed to attain near optimal performance within the hierarchy.

[1]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[2]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[3]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[6]  Daphne Koller,et al.  Policy Iteration for Factored MDPs , 2000, UAI.

[7]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[8]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[9]  Thomas G. Dietterich,et al.  In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.

[10]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[11]  M. Pollack Journal of Artificial Intelligence Research: Preface , 2001 .

[12]  Thomas G. Dietterich An Overview of MAXQ Hierarchical Reinforcement Learning , 2000, SARA.

[13]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[14]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[15]  Dale Schuurmans,et al.  Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs , 2002, ICML.

[16]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[17]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[18]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[19]  C. Atkeson,et al.  Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .

[20]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[21]  Thomas G. Dietterich State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.