TeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs

Reinforcement learning is one of the main adaptive mechanisms that is both well documented in animal behaviour and giving rise to computational studies in animats and robots. In this paper, we present TeXDYNA, an algorithm designed to solve large reinforcement learning problems with unknown structure by integrating hierarchical abstraction techniques of Hierarchical Reinforcement Learning and factorization techniques of Factored Reinforcement Learning. We validate our approach on the LIGHT BOX problem.

[1]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[2]  Christopher M. Vigorito,et al.  Autonomous Hierarchical Skill Acquisition in Factored MDPs , 2008 .

[3]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[4]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[5]  Andrew G. Barto,et al.  Hierarchical Representations of Behavior for Efficient Creative Search , 2008, AAAI Spring Symposium: Creative Intelligent Systems.

[6]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[7]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[8]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[9]  Andrew G. Barto,et al.  A causal approach to hierarchical decomposition of factored MDPs , 2005, ICML.

[10]  Anders Jonsson,et al.  A causal approach to hierarchical decomposition in reinforcement learning , 2006 .

[11]  Andrew G. Barto,et al.  Causal Graph Based Decomposition of Factored MDPs , 2006, J. Mach. Learn. Res..

[12]  András Lörincz,et al.  The many faces of optimism: a unifying approach , 2008, ICML '08.

[13]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[14]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[15]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[16]  Olivier Sigaud,et al.  Learning the structure of Factored Markov Decision Processes in reinforcement learning problems , 2006, ICML.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.