Learning domain structure through probabilistic policy reuse in reinforcement learning

Policy Reuse is a transfer learning approach to improve a reinforcement learner with guidance from previously learned similar policies. The method uses the past policies as a probabilistic bias where the learner chooses among the exploitation of the ongoing learned policy, the exploration of random unexplored actions, and the exploitation of past policies. In this work, we demonstrate that Policy Reuse further contributes to the learning of the structure of a domain. Interestingly and almost as a side effect, Policy Reuse identifies classes of similar policies revealing a basis of core-policies of the domain. We demonstrate theoretically that, under a set of conditions to be satisfied, reusing such a set of core-policies allows us to bound the minimal expected gain received while learning a new policy. In general, Policy Reuse contributes to the overall goal of lifelong reinforcement learning, as (i) it incrementally builds a policy library; (ii) it provides a mechanism to reuse past policies; and (iii) it learns an abstract domain structure in terms of core-policies of the domain.

[1]  Ronald Parr,et al.  Flexible Decomposition Algorithms for Weakly Coupled Markov Decision Problems , 1998, UAI.

[2]  Javier García,et al.  Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[3]  T. Michael Knasel,et al.  Robotics and autonomous systems , 1988, Robotics Auton. Syst..

[4]  Fernando Fernández,et al.  Two steps reinforcement learning , 2008, Int. J. Intell. Syst..

[5]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[6]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[7]  James L. Carroll,et al.  Fixed vs. Dynamic Sub-Transfer in Reinforcement Learning , 2002, ICMLA.

[8]  Alan K. Mackworth,et al.  Using spatial hints to improve policy reuse in a reinforcement learning agent , 2010, AAMAS.

[9]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[10]  M. Veloso,et al.  Bounding the suboptimality of reusing subproblems , 1999, IJCAI 1999.

[11]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[12]  Jean-Arcady Meyer,et al.  Adaptive Behavior , 2005 .

[13]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[14]  Samsung Electronics,et al.  Proceedings Of 1993 International Joint Conference On Neural Networks , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[15]  Alicia P. Wolfe,et al.  Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[16]  Peter Vamplew,et al.  Concurrent Q‐learning: Reinforcement learning for dynamic goals and environments , 2005, Int. J. Intell. Syst..

[17]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[18]  Michael G. Madden,et al.  Transfer of Experience Between Reinforcement Learning Environments with Progressive Difficulty , 2004, Artificial Intelligence Review.

[19]  Thomas J. Walsh Transferring State Abstractions Between MDPs , 2006 .

[20]  International Foundation for Autonomous Agents and MultiAgent Systems ( IFAAMAS ) , 2007 .

[21]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[22]  Peter Stone,et al.  Inter-Task Action Correlation for Reinforcement Learning Tasks , 2006, AAAI.

[23]  Manuela M. Veloso,et al.  Real-Time Randomized Path Planning for Robot Navigation , 2002, RoboCup.

[24]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[25]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[26]  Yann Chevaleyre,et al.  Experiments with Adaptive Transfer Rate in Reinforcement Learning , 2008, PKAW.

[27]  Carlos José Pereira de Lucena,et al.  Dynamically adapting BDI agents based on high-level user specifications , 2011, AAMAS'11.

[28]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[29]  Fernando Fernández,et al.  Policy Reuse for Transfer Learning Across Tasks with Different State and Action Spaces , 2006 .

[30]  Shie Mannor,et al.  Piecewise-stationary bandit problems with side observations , 2009, ICML '09.

[31]  Jude W. Shavlik,et al.  Relational Macros for Transfer in Reinforcement Learning , 2007, ILP.

[32]  Peter Stone,et al.  Improving Action Selection in MDP's via Knowledge Transfer , 2005, AAAI.

[33]  Manuela M. Veloso,et al.  Planning and Learning by Analogical Reasoning , 1994, Lecture Notes in Computer Science.

[34]  Sonia Chernova,et al.  Integrating reinforcement learning with human demonstrations of varying ability , 2011, AAMAS.

[35]  K. R. Dixon,et al.  Incorporating Prior Knowledge and Previously Learned Information into Reinforcement Learning Agents , 2000 .

[36]  Peter Stone,et al.  An Introduction to Intertask Transfer for Reinforcement Learning , 2011, AI Mag..

[37]  Stefan Wrobel,et al.  Proceedings, Twenty-Second International Conference on Machine Learning , 2005 .

[38]  Hiromitsu Hattori,et al.  Advanced Agent Technology , 2011, Lecture Notes in Computer Science.

[39]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[40]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[41]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .

[42]  Zoubin Ghahramani,et al.  Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.

[43]  Peter Stone,et al.  Value Functions for RL-Based Behavior Transfer: A Comparative Study , 2005, AAAI.

[44]  Satinder Singh Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..

[45]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[46]  Achim G. Hoffmann,et al.  Proceedings of the Nineteenth International Conference on Machine Learning , 2002 .

[47]  Manuela Veloso,et al.  Tree based hierarchical reinforcement learning , 2002 .

[48]  Manuela M. Veloso,et al.  Reusing and Building a Policy Library , 2006, ICAPS.

[49]  W. Smart,et al.  Practical Reinforcement Learning , 2000, ICML 2000.

[50]  Bikramjit Banerjee,et al.  Adaptive multi-robot team reconfiguration using a policy-reuse reinforcement learning approach , 2011, AAMAS'11.

[51]  Vishal Soni,et al.  Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains , 2006, AAAI.

[52]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[53]  James L. Carroll,et al.  Memory-guided exploration in reinforcement learning , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[54]  Sven Koenig,et al.  Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation , 2002 .

[55]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[56]  Andrew G. Barto,et al.  Causal Graph Based Decomposition of Factored MDPs , 2006, J. Mach. Learn. Res..

[57]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[58]  Jude W. Shavlik,et al.  Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression , 2005, AAAI.

[59]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[60]  Peter Stone,et al.  Cross-domain transfer for reinforcement learning , 2007, ICML '07.

[61]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[62]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.