Self-Organizing Sensorimotor Maps Plus Internal Motivations Yield Animal-Like Behavior

This article investigates how a motivational module can drive an animat to learn a sensorimotor cognitive map and use it to generate flexible goal-directed behavior. Inspired by the rat’s hippocampus and neighboring areas, the time growing neural gas (TGNG) algorithm is used, which iteratively builds such a map by means of temporal Hebbian learning. The algorithm is combined with a motivation module, which activates goals, priorities, and consequent activity gradients in the developing cognitive map for the self-motivated control of behavior. The resulting motivated TGNG thus combines a neural cognitive map learning process with top-down, self-motivated, anticipatory behavior control mechanisms. While the algorithms involved are kept rather simple, motivated TGNG displays several emergent behavioral patterns, self-sustainment, and reliable latent learning. We conclude that motivated TGNG constitutes a solid basis for future studies on self-motivated cognitive map learning, on the design of further enhanced systems with additional cognitive modules, and on the realization of highly adaptive, interactive, goal-directed, cognitive systems.

[1]  David Kortenkamp,et al.  Topological Mapping for Mobile Robots Using a Combination of Sonar and Vision Sensing , 1994, AAAI.

[2]  Marc Toussaint,et al.  Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System , 2003, NIPS.

[3]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[4]  Benjamin Kuipers,et al.  Towards a general theory of topological maps , 2004, Artif. Intell..

[5]  Martin V. Butz,et al.  Distinction between types of motivations: Emergent behavior with a neural, model-based reinforcement learning system , 2009, 2009 IEEE Symposium on Artificial Life.

[6]  David J. Foster,et al.  A model of hippocampally dependent navigation, using the temporal difference learning rule , 2000, Hippocampus.

[7]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[8]  Frederick L Crabbe,et al.  Compromise strategies for action selection , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[9]  Sandra Clara Gadanho,et al.  Robot Learning Driven by Emotions , 2001, Adapt. Behav..

[10]  Martin V. Butz,et al.  Bridging the Gap: Learning Sensorimotor-Linked Population Codes for Planning and Motor Control , 2008 .

[11]  A D Redish,et al.  Prediction, sequences and the hippocampus , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[12]  M. Posner,et al.  Components of visual orienting , 1984 .

[13]  Mark Humphrys,et al.  Action Selection methods using Reinforcement Learning , 1996 .

[14]  A. Greenwald,et al.  Sensory feedback mechanisms in performance control: with special reference to the ideo-motor mechanism. , 1970, Psychological review.

[15]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[16]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[17]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[18]  J. Konczak,et al.  The development toward stereotypic arm kinematics during reaching in the first 3 years of life , 1997, Experimental Brain Research.

[19]  D. Bouwhuis,et al.  Attention and performance X : control of language processes , 1986 .

[20]  Philippe Gaussier,et al.  From view cells and place cells to cognitive map learning: processing stages of the hippocampal system , 2002, Biological Cybernetics.

[21]  Gilles Venturini,et al.  Adaptation in dynamic environments through a minimal probability of exploration , 1994 .

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  Neil Burgess,et al.  Predictions derived from modelling the hippocampal role in navigation , 2000, Biological Cybernetics.

[24]  Philippe Gaussier,et al.  Transition Cells for Navigation and Planning in an Unknown Environment , 2006, SAB.

[25]  G. Aschersleben,et al.  The Theory of Event Coding (TEC): a framework for perception and action planning. , 2001, The Behavioral and brain sciences.

[26]  Rodney A. Brooks,et al.  Learning a Distributed Map Representation Based on Navigation Behaviors , 1999 .

[27]  Roland Vollgraf,et al.  From grids to places , 2007, Journal of Computational Neuroscience.

[28]  Peter Redgrave,et al.  Layered Control Architectures in Robots and Vertebrates , 1999, Adapt. Behav..

[29]  David J. Foster,et al.  Reverse replay of behavioural sequences in hippocampal place cells during the awake state , 2006, Nature.

[30]  F. W. Irwin Purposive Behavior in Animals and Men , 1932, The Psychological Clinic.

[31]  Martin V. Butz,et al.  Anticipatory Behavior in Adaptive Learning Systems, From Brains to Individual and Social Behavior [the book is a result from the third workshop on anticipatory behavior in adaptive learning systems, ABiALS 2006, Rome, Italy, September 30, 2006, colocated with SAB 2006] , 2007, ABiALS book.

[32]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[33]  George Dimitri Konidaris,et al.  An Architecture for Behavior-Based Reinforcement Learning , 2005, Adapt. Behav..

[34]  Philippe Capdepuy,et al.  Construction of an Internal Predictive Model by Event Anticipation , 2007, SAB ABiALS.

[35]  R. A. Brooks,et al.  Intelligence without Representation , 1991, Artif. Intell..

[36]  Michael A. Arbib,et al.  Who Needs Emotions? - The brain meets the robot , 2004, Who Needs Emotions?.

[37]  J. P. Seward An experimental analysis of latent learning. , 1949, Journal of experimental psychology.

[38]  Martin V. Butz,et al.  Efiective Online Detection of Task-Independent Landmarks , 2004 .

[39]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[40]  J. Banquet,et al.  Spatial Navigation and Hippocampal Place Cell Firing: The Problem of Goal Encoding , 2004, Reviews in the neurosciences.

[41]  Michael Jenkin,et al.  Robotic exploration as graph construction , 1991, IEEE Trans. Robotics Autom..

[42]  Adam Johnson,et al.  Neural Ensembles in CA3 Transiently Encode Paths Forward of the Animal at a Decision Point , 2007, The Journal of Neuroscience.

[43]  C WiemerJan The time-organized map algorithm , 2003 .

[44]  Joana Hois,et al.  A belief-based architecture for scene analysis: From sensorimotor features to knowledge and ontology , 2009, Fuzzy Sets Syst..

[45]  Geoffrey E. Hinton,et al.  GTM through time , 1997 .

[46]  Benjamin Kuipers,et al.  Map Learning with Uninterpreted Sensors and Effectors , 1995, Artif. Intell..

[47]  Angelo Arleo,et al.  Spatial cognition and neuro-mimetic navigation: a model of hippocampal place cell activity , 2000, Biological Cybernetics.

[48]  T. Prescott,et al.  Is there a brainstem substrate for action selection? , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[49]  Bruce L. McNaughton,et al.  Path integration and the neural basis of the 'cognitive map' , 2006, Nature Reviews Neuroscience.

[50]  Stephen R. Marsland,et al.  Fast, On-Line Learning of Globally Consistent Maps , 2002, Auton. Robots.

[51]  M. Arbib,et al.  Tool use and the distalization of the end-effector , 2009, Psychological research.

[52]  Kathryn E. Merrick,et al.  Motivated Learning from Interesting Events: Adaptive, Multitask Learning Agents for Complex Environments , 2009, Adapt. Behav..

[53]  Ralf Der,et al.  From Motor Babbling to Purposive Actions: Emerging Self-exploration in a Dynamical Systems Approach to Early Robot Development , 2006, SAB.

[54]  Martin V. Butz,et al.  Explorations of anticipatory behavioral control (ABC): a report from the cognitive psychology unit of the University of Würzburg , 2007, Cognitive Processing.

[55]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[56]  R. Passingham The hippocampus as a cognitive map J. O'Keefe & L. Nadel, Oxford University Press, Oxford (1978). 570 pp., £25.00 , 1979, Neuroscience.

[57]  Andrew G. Barto,et al.  An Adaptive Robot Motivational System , 2006, SAB.

[58]  E. Save,et al.  Coding for spatial goals in the prelimbic/infralimbic area of the rat frontal cortex. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[59]  Wolfram Schenck,et al.  Bootstrapping Cognition from Behavior - A Computerized Thought Experiment , 2008, Cogn. Sci..

[60]  D. Roy,et al.  A Habit System for an Interactive Robot , 2005 .

[61]  M. Moser,et al.  Representation of Geometric Borders in the Entorhinal Cortex , 2008, Science.

[62]  J. F. Herbart Psychologie als Wissenschaft : neu gegründet auf Erfahrung, Metaphysik und Mathematik , 1824 .

[63]  Jan C. Wiemer,et al.  The Time-Organized Map Algorithm: Extending the Self-Organizing Map to Spatiotemporal Signals , 2003, Neural Computation.

[64]  Ariane S Etienne,et al.  Path integration in mammals , 2004, Hippocampus.

[65]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[66]  Martin V. Butz,et al.  Biasing Exploration in an Anticipatory Learning Classifier System , 2001, IWLCS.

[67]  Jason Fleischer,et al.  Neural Correlates of Anticipation in Cerebellum, Basal Ganglia, and Hippocampus , 2007, SAB ABiALS.

[68]  E. Rolls,et al.  Self-organizing continuous attractor networks and path integration: two-dimensional models of place cells , 2002, Network.

[69]  Michael Recce,et al.  A model of hippocampal function , 1994, Neural Networks.

[70]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[71]  Jean-Arcady Meyer,et al.  Adaptive Behavior , 2005 .

[72]  B Fritzke,et al.  A growing neural gas network learns topologies. G. Tesauro, DS Touretzky, and TK Leen, editors , 1995, NIPS 1995.

[73]  Pattie Maes,et al.  A bottom-up mechanism for behavior selection in an artificial creature , 1991 .

[74]  Toshiyuki Nakagaki,et al.  Amoebae anticipate periodic events. , 2008, Physical review letters.

[75]  Emilio Kropff,et al.  Place cells, grid cells, and the brain's spatial representation system. , 2008, Annual review of neuroscience.

[76]  Rodney A. Brooks,et al.  Learning to Coordinate Behaviors , 1990, AAAI.

[77]  E. Save,et al.  Flexible use of proximal objects and distal cues by hippocampal place cells , 2007, Hippocampus.

[78]  Stephen Grossberg,et al.  Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps , 1992, IEEE Trans. Neural Networks.

[79]  Marc Toussaint,et al.  A Sensorimotor Map: Modulating Lateral Interactions for Anticipation and Planning , 2006, Neural Computation.

[80]  E. Rolls,et al.  A computational theory of hippocampal function, and empirical tests of the theory , 2006, Progress in Neurobiology.

[81]  Benjamin Kuipers,et al.  Modeling Spatial Knowledge , 1978, IJCAI.

[82]  Dolores Cañamero,et al.  Modeling motivations and emotions as a basis for intelligent behavior , 1997, AGENTS '97.

[83]  Ralf Der,et al.  Predictive information and explorative behavior of autonomous robots , 2008 .

[84]  Benjamin Kuipers,et al.  A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations , 1991, Robotics Auton. Syst..

[85]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[86]  D. Spalding The Principles of Psychology , 1873, Nature.

[87]  A. Berthoz,et al.  Multisensory processing in the elaboration of place and head direction responses by limbic system neurons. , 2002, Brain research. Cognitive brain research.

[88]  Mark Humphreys,et al.  Action selection methods using reinforcement learning , 1997 .

[89]  Andrea Bonarini,et al.  Incremental Skill Acquisition for Self-motivated Learning Animats , 2006, SAB.

[90]  Stewart W. Wilson,et al.  A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[91]  Dana H. Ballard,et al.  Multiple-Goal Reinforcement Learning with Modular Sarsa(0) , 2003, IJCAI.