Utility of Turning Spot Learning under complex goal search and the limit of memory usage

Chain Form Reinforcement Learning (CFRL) was proposed for a reinforcement learning agent using low memory. However, we hold unused information in the memory. In this paper, we introduce Turning Spot Learning (TSL). The method allows an agent to learn with less memory than a CFRL agent. TSL is a method which imitates human perceptions. If we are asked direction, we often tell spot where we changes our action. We call it "Turning Spot". It retains information regarding state, action and distance of a present spot to a next spot. A TSL agent learns only Turning Spots and uses our original action selection method using nearest neighbor algorithm. And, we attempted to limit the amount of the memory usage that a TSL usage can use. Our method was made a comparison to Q-Learning and CFRL in two kinds of goal search problems. We examined performance and discussed the best usage environment.

[1]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[2]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[3]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[4]  Hidetomo Ichihashi,et al.  Simple Reinforcement Learning for Small-Memory Agent , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[5]  P. Y. Glorennec,et al.  Fuzzy Q-learning and dynamical fuzzy Q-learning , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[6]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[7]  Hidetomo Ichihashi,et al.  Chain Form Reinforcement Learning for Small-Memory Agent , 2012 .

[8]  Katsuhiro Honda,et al.  Moratorium Effect on Estimation Values in Simple Reinforcement Learning , 2013 .