Automatic Discovery and Transfer of Task Hierarchies in Reinforcement Learning

Sequential decision tasks present many opportunities for the study of transfer learning. A principal one among them is the existence of multiple domains that share the same underlying causal structure for actions. We describe an approach that exploits this shared causal structure to discover a hierarchical task structure in a source domain, which in turn speeds up learning of task execution knowledge in a new target domain. Our approach is theoretically justified and compares favorably to manually designed task hierarchies in learning efficiency in the target domain. We demonstrate that causally motivated task hierarchies transfer more robustly than other kinds of detailed knowledge that depend on the idiosyncrasies of the source domain and are hence less transferable.

[1]  Sriraam Natarajan,et al.  Transfer in variable-reward hierarchical reinforcement learning , 2008, Machine Learning.

[2]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[3]  Andrew G. Barto,et al.  Causal Graph Based Decomposition of Factored MDPs , 2006, J. Mach. Learn. Res..

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Hector Muñoz-Avila,et al.  Learning Hierarchical Task Networks for Nondeterministic Planning Domains , 2009, IJCAI.

[6]  Thomas G. Dietterich,et al.  Automatic discovery and transfer of MAXQ hierarchies , 2008, ICML '08.

[7]  Kenneth D. Forbus,et al.  Analogical Learning in a Turn-Based Strategy Game , 2007, IJCAI.

[8]  Thomas G. Dietterich,et al.  Hierarchical Explanation-Based Reinforcement Learning , 1997, ICML.

[9]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[10]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[11]  Prasad Tadepalli,et al.  Learning Goal-Decomposition Rules Using Exercises , 1997, AAAI/IAAI.

[12]  Prasad Tadepalli,et al.  Multi-Agent Shared Hierarchy Reinforcement Learning , 2005 .

[13]  Jude W. Shavlik,et al.  Relational Macros for Transfer in Reinforcement Learning , 2007, ILP.

[14]  Dana S. Nau,et al.  SHOP2: An HTN Planning System , 2003, J. Artif. Intell. Res..

[15]  Pat Langley,et al.  Learning hierarchical task networks by observation , 2006, ICML.

[16]  Roni Khardon,et al.  First Order Decision Diagrams for Relational MDPs , 2007, IJCAI.

[17]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[18]  Stuart J. Russell,et al.  Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.

[19]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[20]  Roland J. Zito-Wolf,et al.  Learning search control knowledge: An explanation-based approach , 1991, Machine Learning.

[21]  Thomas G. Dietterich,et al.  Learning MDP Action Models Via Discrete Mixture Trees , 2008, ECML/PKDD.

[22]  Manuela M. Veloso,et al.  Reusing and Building a Policy Library , 2006, ICAPS.

[23]  Pat Langley,et al.  Learning Recursive Control Programs from Problem Solving , 2006, J. Mach. Learn. Res..

[24]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[25]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.