Distinction between types of motivations: Emergent behavior with a neural, model-based reinforcement learning system

In this paper, we analyze the behavior of a simulated mobile robot, which interacts with an initially unknown maze-environment. The robot is controlled by an interactive system that is based on a model building Time Growing Neural Gas (TGNG) algorithm and a homeostatic motivational system, which activates movement preferences and goals within the emergent model structure for behavioral control. We propose to differentiate two types of drives (if not more), which we call location- and characteristics-based drives. We exemplary implement the two types of drives by “hunger” and “fear”, respectively. Several possible methods of combination of the two drives are investigated through simulation, identifying the combination that lead to the most suitable emergent behavior, such as emergent “wall-following” and “hiding”. Moreover, we investigate performance in an ALife-like scenario, in which the robot interacts with several food-dispensers. It is shown that additional behavioral concepts, such as “curiosity” and “inhibition of return”, can maximize the survival chances of the organism, who maintains maximal safety and keeps its belly full. In conclusion, we propose that the concept of motivation needs to be further differentiated to realize autonomous, life-like robots that are able to optimally satisfy multiple, competing types of motivations by emergent, innovative behavioral patterns.

[1]  Jason Fleischer,et al.  Neural Correlates of Anticipation in Cerebellum, Basal Ganglia, and Hippocampus , 2007, SAB ABiALS.

[2]  Philippe Capdepuy,et al.  Construction of an Internal Predictive Model by Event Anticipation , 2007, SAB ABiALS.

[3]  D. Roy,et al.  A Habit System for an Interactive Robot , 2005 .

[4]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[5]  Marc Toussaint,et al.  A Sensorimotor Map: Modulating Lateral Interactions for Anticipation and Planning , 2006, Neural Computation.

[6]  Bruno Poucet,et al.  Goal-Related Activity in Hippocampal Place Cells , 2007, The Journal of Neuroscience.

[7]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[8]  Martin V. Butz,et al.  Bridging the Gap: Learning Sensorimotor-Linked Population Codes for Planning and Motor Control , 2008 .

[9]  M. Posner,et al.  Components of visual orienting , 1984 .

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Martin V. Butz,et al.  Efiective Online Detection of Task-Independent Landmarks , 2004 .

[12]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[13]  Andrew G. Barto,et al.  An Adaptive Robot Motivational System , 2006, SAB.

[14]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[15]  Toshiyuki Nakagaki,et al.  Amoebae anticipate periodic events. , 2008, Physical review letters.

[16]  Stewart W. Wilson,et al.  A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[17]  Dana H. Ballard,et al.  Multiple-Goal Reinforcement Learning with Modular Sarsa(0) , 2003, IJCAI.

[18]  E. Save,et al.  Coding for spatial goals in the prelimbic/infralimbic area of the rat frontal cortex. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[20]  Martin V. Butz,et al.  Biasing Exploration in an Anticipatory Learning Classifier System , 2001, IWLCS.