Relativized Options: Choosing the Right Transformation

Relativized options combine model minimization methods and a hierarchical reinforcement learning framework to derive compact reduced representations of a related family of tasks. Relativized options are defined without an absolute frame of reference, and an option's policy is transformed suitably based on the circumstances under which the option is invoked. In earlier work we addressed the issue of learning the option policy online. In this article we develop an algorithm for choosing, from among a set of candidate transformations, the right transformation for each member of the family of tasks.

[1]  J. Hartmanis Algebraic structure theory of sequential machines (Prentice-Hall international series in applied mathematics) , 1966 .

[2]  J. Hartmanis,et al.  Algebraic Structure Theory Of Sequential Machines , 1966 .

[3]  Philip E. Agre,et al.  The dynamic structure of everyday life , 1988 .

[4]  Glenn A. Iba,et al.  A heuristic approach to the discovery of macro-operators , 2004, Machine Learning.

[5]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[6]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[7]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[8]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[9]  Andrew G. Barto,et al.  Automated State Abstraction for Options using the U-Tree Algorithm , 2000, NIPS.

[10]  Balaraman Ravindran,et al.  Symmetries and Model Minimization in Markov Decision Processes , 2001 .

[11]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[12]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[13]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[14]  Roderic A. Grupen,et al.  Learning prospective pick and place behavior , 2002, Proceedings 2nd International Conference on Development and Learning. ICDL 2002.

[15]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[16]  Balaraman Ravindran,et al.  Model Minimization in Hierarchical Reinforcement Learning , 2002, SARA.

[17]  Balaraman Ravindran,et al.  SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes , 2003, IJCAI.

[18]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..