Hierarchical Reinforcement Learning

The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements. We introduce Regu­ larized Hierarchical Policy Optimization (RHPO) to improve data-efliciency for domains with multiple dominant tasks and ultimately reduce required platform time. To this end, we employ compositional inductive biases on multiple levels and corresponding mechanisms for sharing off-policy transition data across low-level controllers and tasks as well as scheduling of tasks. The presented algorithm enables stable and fast learning for complex, real-world domains in the parallel multitask and sequential transfer case. We show that the investigated types of hierarchy enable positive transfer while partially mitigating negative interference and evaluate the benefits of additional incentives for efficient, compositional task solutions in single task domains. Finally, we demonstrate substantial data-efficiency and final performance gains over competitive baselines in a week-long, physical robot stacking experiment.

[1]  R. Nedunchezhian,et al.  Rapid Privacy Preserving Algorithm for Large Databases , 2006, Int. J. Intell. Inf. Technol..

[2]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[3]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[4]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[5]  John R. Anderson,et al.  The Transfer of Cognitive Skill , 1989 .

[6]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[7]  Alejandro Pazos Sierra,et al.  Encyclopedia of Artificial Intelligence , 2008 .

[8]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[9]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[10]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[11]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[12]  Thomas G. Dietterich,et al.  To transfer or not to transfer , 2005, NIPS 2005.

[13]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[14]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[15]  S. Srihari Mixture Density Networks , 1994 .

[16]  Roy Gelbard,et al.  Handling Fuzzy Similarity for Data Classification , 2009, Encyclopedia of Artificial Intelligence.

[17]  Manuel A. Sánchez-Montañés,et al.  Class Prediction in Test Sets with Shifted Distributions , 2009, Encyclopedia of Artificial Intelligence.

[18]  Athanasios K. Tsakalidis,et al.  Semantic Knowledge Mining Techniques for Ubiquitous Access Media Usage Analysis , 2008 .

[19]  Gavriel Salomon,et al.  T RANSFER OF LEARNING , 1992 .

[20]  Marcello Cinque,et al.  On Dependability Issues in Ambient Intelligence Systems , 2011, Int. J. Ambient Comput. Intell..

[21]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.