Linear options

Learning, planning, and representing knowledge in large state spaces at multiple levels of temporal abstraction are key, long-standing challenges for building flexible autonomous agents. The options framework provides a formal mechanism for specifying and learning temporally-extended skills. Although past work has demonstrated the benefit of acting according to options in continuous state spaces, one of the central advantages of temporal abstraction---the ability to plan using a temporally abstract model---remains a challenging problem when the number of environment states is large or infinite. In this work, we develop a knowledge construct, the linear option, which is capable of modeling temporally abstract dynamics in continuous state spaces. We show that planning with a linear expectation model of an option's dynamics converges to a fixed point with low Temporal Difference (TD) error. Next, building on recent work on linear feature selection, we show conditions under which a linear feature set is sufficient for accurately representing the value function of an option policy. We extend this result to show conditions under which multiple options may be repeatedly composed to create new options with accurate linear models. Finally, we demonstrate linear option learning and planning algorithms in a simulated robot environment.

[1]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[3]  Andrew G. Barto,et al.  Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.

[4]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[5]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[6]  Justin A. Boyan,et al.  Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[7]  Andrew G. Barto,et al.  An intrinsic reward mechanism for efficient exploration , 2006, ICML.

[8]  Vishal Soni,et al.  Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains , 2006, AAAI.

[9]  Maurice Bruynooghe,et al.  Learning Relational Options for Inductive Transfer in Relational Reinforcement Learning , 2007, ILP.

[10]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[11]  Lihong Li,et al.  An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.

[12]  Alborz Geramifard,et al.  Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.

[13]  Andrew G. Barto,et al.  Efficient skill learning using abstraction selection , 2009, IJCAI 2009.