Self-Organizing Maps as a Storage and Transfer Mechanism in Reinforcement Learning

The idea of reusing information from previously learned tasks (source tasks) for the learning of new tasks (target tasks) has the potential to significantly improve the sample efficiency reinforcement learning agents. In this work, we describe an approach to concisely store and represent learned task knowledge, and reuse it by allowing it to guide the exploration of an agent while it learns new tasks. In order to do so, we use a measure of similarity that is defined directly in the space of parameterized representations of the value functions. This similarity measure is also used as a basis for a variant of the growing self-organizing map algorithm, which is simultaneously used to enable the storage of previously acquired task knowledge in an adaptive and scalable manner.We empirically validate our approach in a simulated navigation environment and discuss possible extensions to this approach along with potential applications where it could be particularly useful.

[1]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[2]  J.L. Carroll,et al.  Task similarity measures for transfer in reinforcement learning task libraries , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[3]  Matthieu Geist,et al.  Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..

[4]  Jacek M. Zurada,et al.  Self-Organizing Neural Networks Integrating Domain Knowledge and Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[7]  Tom Schaul,et al.  The two-dimensional organization of behavior , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[8]  Sebastian Thrun,et al.  Clustering Learning Tasks and the Selective Cross-Task Transfer of Knowledge , 1998, Learning to Learn.

[9]  L. Carin,et al.  Transfer Learning for Reinforcement Learning with Dependent Dirichlet Process and Gaussian Process , 2012 .

[10]  Richard S. Sutton,et al.  Scaling life-long off-policy learning , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[11]  Eric Eaton,et al.  An automated measure of MDP similarity for transfer in reinforcement learning , 2014, AAAI 2014.

[12]  Bala Srinivasan,et al.  Dynamic self-organizing maps with controlled growth for knowledge discovery , 2000, IEEE Trans. Neural Networks Learn. Syst..

[13]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[14]  Thommen George Karimpanal,et al.  Identification and off-policy learning of multiple objectives using adaptive clustering , 2017, Neurocomputing.

[15]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[16]  Yusen Zhan,et al.  Online Transfer Learning in Reinforcement Learning Domains , 2015, AAAI Fall Symposia.

[17]  Doina Precup,et al.  Metrics for Finite Markov Decision Processes , 2004, AAAI.

[18]  Matthew E. Taylor,et al.  Teaching on a budget: agents advising agents in reinforcement learning , 2013, AAMAS.

[19]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[20]  Matthieu Zimmer,et al.  Teacher-Student Framework: a Reinforcement Learning Approach , 2014 .

[21]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[22]  Yang Gao,et al.  Measuring the Distance Between Finite Markov Decision Processes , 2016, AAMAS.

[23]  Alessandro Lazaric,et al.  Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.