Policy and Value Transfer in Lifelong Reinforcement Learning

We consider the problem of how best to use prior experience to bootstrap lifelong learning, where an agent faces a series of task instances drawn from some task distribution. First, we identify the initial policy that optimizes expected performance over the distribution of tasks for increasingly complex classes of policy and task distributions. We empirically demonstrate the relative performance of each policy class’ optimal element in a variety of simple task distributions. We then consider value-function initialization methods that preserve PAC guarantees while simultaneously minimizing the learning required in two learning algorithms, yielding MAXQINIT, a practical new method for value-function-based transfer. We show that MAXQINIT performs well in simple lifelong RL experiments.

[1]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[2]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[3]  M. Littman,et al.  Toward Good Abstractions for Lifelong Learning , 2017 .

[4]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[5]  Eric Wiewiora,et al.  Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[6]  Sam Devlin,et al.  Expressing Arbitrary Reward Functions as Potential-Based Advice , 2015, AAAI.

[7]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[8]  Marie desJardins,et al.  Portable Option Discovery for Automated Learning Transfer in Object-Oriented Markov Decision Processes , 2015, IJCAI.

[9]  Lihong Li,et al.  PAC model-free reinforcement learning , 2006, ICML.

[10]  Manuela M. Veloso,et al.  Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.

[11]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[12]  Yoonsuck Choe,et al.  Directed Exploration in Reinforcement Learning with Transferred Knowledge , 2012, EWRL.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[15]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[16]  Shimon Whiteson,et al.  Transfer via inter-task mappings in policy search reinforcement learning , 2007, AAMAS '07.

[17]  Sam Devlin,et al.  Dynamic potential-based reward shaping , 2012, AAMAS.

[18]  Stefanie Tellex,et al.  Goal-Based Action Priors , 2015, ICAPS.

[19]  Peter Stone,et al.  Cross-domain transfer for reinforcement learning , 2007, ICML '07.

[20]  Lihong Li,et al.  PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[21]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[22]  Sebastian Thrun,et al.  Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[23]  Lihong Li,et al.  The Online Discovery Problem and Its Application to Lifelong Reinforcement Learning , 2015, ArXiv.

[24]  Benjamin Rosman,et al.  What good are actions? Accelerating learning using learned action priors , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[25]  Matthew E. Taylor,et al.  Policy Transfer using Reward Shaping , 2015, AAMAS.

[26]  Andrew G. Barto,et al.  PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.

[27]  Sheldon M. Ross,et al.  Introduction to Stochastic Dynamic Programming: Probability and Mathematical , 1983 .

[28]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[29]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[30]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[31]  Lihong Li,et al.  Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.

[32]  Thomas J. Walsh Transferring State Abstractions Between MDPs , 2006 .

[33]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[34]  Peter Stone,et al.  Representation Transfer for Reinforcement Learning , 2007, AAAI Fall Symposium: Computational Approaches to Representation Change during Learning and Development.

[35]  Eric Eaton,et al.  Using Task Features for Zero-Shot Knowledge Transfer in Lifelong Learning , 2016, IJCAI.

[36]  Michael L. Littman,et al.  Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.

[37]  Lihong Li,et al.  Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..

[38]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..