Effective Control Knowledge Transfer through Learning Skill and Representation Hierarchies

Learning capabilities of computer systems still lag far behind biological systems. One of the reasons can be seen in the inefficient re-use of control knowledge acquired over the lifetime of the artificial learning system. To address this deficiency, this paper presents a learning architecture which transfers control knowledge in the form of behavioral skills and corresponding representation concepts from one task to subsequent learning tasks. The presented system uses this knowledge to construct a more compact state space representation for learning while assuring bounded optimality of the learned task policy by utilizing a representation hierarchy. Experimental results show that the presented method can significantly outperform learning on a flat state space representation and the MAXQ method for hierarchical reinforcement learning.

[1]  C. Storrar Edinburgh , 1875, The Accountant’s Magazine.

[2]  Robert Givan,et al.  Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[3]  R. Sutton Between MDPs and Semi-MDPs : Learning , Planning , and Representing Knowledge at Multiple Temporal Scales , 1998 .

[4]  Doina Precup,et al.  Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales , 1998 .

[5]  IT Kee-EungKim Solving Factored MDPs Using Non-homogeneous Partitions , 1998 .

[6]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[7]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[8]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[9]  Kee-Eung Kim,et al.  Solving factored MDPs using non-homogeneous partitions , 2003, Artif. Intell..

[10]  Manfred Huber,et al.  Subgoal Discovery for Hierarchical Reinforcement Learning Using Learned Policies , 2003, FLAIRS.

[11]  Peter Stone,et al.  Behavior transfer for value-function-based reinforcement learning , 2005, AAMAS '05.

[12]  Bhaskara Marthi,et al.  Concurrent Hierarchical Reinforcement Learning , 2005, IJCAI.

[13]  Thomas G. Dietterich,et al.  Transfer Learning with an Ensemble of Background Tasks , 2005, NIPS 2005.

[14]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[15]  M. Huber,et al.  Accelerating Action Dependent Hierarchical Reinforcement Learning Through Autonomous Subgoal Discovery , 2005 .