Efficient skill learning using abstraction selection

We present an algorithm for selecting an appropriate abstraction when learning a new skill. We show empirically that it can consistently select an appropriate abstraction using very little sample data, and that it significantly improves skill learning performance in a reasonably large real-valued reinforcement learning domain.

[1]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[2]  Maja J. Matarić,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[3]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[4]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[5]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[6]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[7]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[8]  Andrew G. Barto,et al.  Automated State Abstraction for Options using the U-Tree Algorithm , 2000, NIPS.

[9]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[10]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[11]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[12]  Alicia P. Wolfe,et al.  Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Christos G. Cassandras,et al.  Discrete-Event Systems , 2005, Handbook of Networked and Embedded Control Systems.

[15]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[16]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[17]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[18]  Bram Bakker,et al.  Reinforcement Learning with Multiple, Qualitatively Different State Representations , 2007 .

[19]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[20]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.