Automatic Construction of Temporally Extended Actions for MDPs Using Bisimulation Metrics

Temporally extended actions are usually effective in speeding up reinforcement learning. In this paper we present a mechanism for automatically constructing such actions, expressed as options [24], in a finite Markov Decision Process (MDP). To do this, we compute a bisimulation metric [7] between the states in a small MDP and the states in a large MDP, which we want to solve. The shape of this metric is then used to completely define a set of options for the large MDP. We demonstrate empirically that our approach is able to improve the speed of reinforcement learning, and is generally not sensitive to parameter tuning.

[1]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[2]  M. Rehm,et al.  Proceedings of AAMAS , 2005 .

[3]  Doina Precup,et al.  Bounding Performance Loss in Approximate MDP Homomorphisms , 2008, NIPS.

[4]  Shie Mannor,et al.  Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[5]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[6]  Satinder P. Singh,et al.  Transfer via soft homomorphisms , 2009, AAMAS.

[7]  Doina Precup,et al.  Using Bisimulation for Policy Transfer in MDPs , 2010, AAAI.

[8]  Alicia P. Wolfe,et al.  Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[9]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[10]  Andrew G. Barto,et al.  Causal Graph Based Decomposition of Factored MDPs , 2006, J. Mach. Learn. Res..

[11]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[12]  John R. Anderson ACT: A simple theory of complex cognition. , 1996 .

[13]  Michael Wooldridge,et al.  Proceedings of the 21st International Joint Conference on Artificial Intelligence , 2009 .

[14]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[15]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[16]  Jean-Daniel Zucker,et al.  Abstraction, Reformulation and Approximation, 6th International Symposium, SARA 2005, Airth Castle, Scotland, UK, July 26-29, 2005, Proceedings , 2005, SARA.

[17]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[18]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[19]  Alicia P. Wolfe Defining Object Types and Options Using MDP Homomorphisms , 2006 .

[20]  Thomas G. Dietterich,et al.  Automatic discovery and transfer of MAXQ hierarchies , 2008, ICML '08.

[21]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[22]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[23]  Peng Zhou,et al.  Discovering options from example trajectories , 2009, ICML '09.

[24]  Doina Precup,et al.  Metrics for Finite Markov Decision Processes , 2004, AAAI.

[25]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[26]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[27]  Scott Kuindersma,et al.  Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories , 2010, NIPS.

[28]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[29]  Vishal Soni,et al.  Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains , 2006, AAAI.

[30]  Doina Precup,et al.  Optimal policy switching algorithms for reinforcement learning , 2010, AAMAS.

[31]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[32]  Balaraman Ravindran,et al.  Relativized Options: Choosing the Right Transformation , 2003, ICML.

[33]  Doina Precup,et al.  Methods for Computing State Similarity in Markov Decision Processes , 2006, UAI.

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.