From Exploration to Planning

Learning and behaviour of mobile robots faces limitations. In reinforcement learning, for example, an agent learns a strategy to get to only one specific target point within a state space. However, we can grasp a visually localized object at any point in space or navigate to any position in a room. We present a neural network model in which an agent learns a model of the state space that allows him to get to an arbitrarily chosen goal via a short route. By randomly exploring the state space, the agent learns associations between two adjoining states and the action that links them. Given arbitrary starting and goal positions, route-finding is done in two steps. First, an activation gradient spreads around the goal position along the associative connections. Second, the agent uses state-action associations to determine the actions leading to ascend the gradient toward the goal. All mechanisms are biologically justifiable.

[1]  David J. Foster,et al.  A model of hippocampally dependent navigation, using the temporal difference learning rule , 2000, Hippocampus.

[2]  Pierre-Yves Oudeyer,et al.  The Playground Experiment: Task-Independent Development of a Curious Robot , 2005 .

[3]  J. Bolam,et al.  Uniform Inhibition of Dopamine Neurons in the Ventral Tegmental Area by Aversive Stimuli , 2004, Science.

[4]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[5]  Kurt Hornik,et al.  Artificial Neural Networks — ICANN 2001 , 2001, Lecture Notes in Computer Science.

[6]  G. Sandini,et al.  Babybot : an artificial developing robotic agent , 2000 .

[7]  Mark Witkowski,et al.  An Action-Selection Calculus , 2007, Adapt. Behav..

[8]  Pieter R. Roelfsema,et al.  Attention-Gated Reinforcement Learning of Internal Representations for Classification , 2005, Neural Computation.

[9]  Marco Iacoboni,et al.  Beyond a Single Area: Motor Control and Language Within a Neural Architecture Encompassing Broca's Area , 2006, Cortex.

[10]  J. Michael Herrmann,et al.  Learning predictive representations , 2000, Neurocomputing.

[11]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[12]  Patricia S Churchland,et al.  Self‐Representation in Nervous Systems , 2003, Annals of the New York Academy of Sciences.

[13]  Rufin van Rullen,et al.  Rate Coding Versus Temporal Order Coding: What the Retinal Ganglion Cells Tell the Visual Cortex , 2001, Neural Computation.

[14]  Cornelius Weber,et al.  Self-Organization of Orientation Maps, Lateral Connections, and Dynamic Receptive Fields in the Primary Visual Cortex , 2001, ICANN.

[15]  Ralf Der,et al.  From Motor Babbling to Purposive Actions: Emerging Self-exploration in a Dynamical Systems Approach to Early Robot Development , 2006, SAB.

[16]  Thomas Stützle,et al.  Ant colony optimization: artificial ants as a computational intelligence technique , 2006 .

[17]  Giulio Sandini,et al.  Babybot: a biologically inspired developing robotic agent , 2000, SPIE Optics East.

[18]  Jürgen Schmidhuber,et al.  Optimal Artificial Curiosity, Creativity, Music, and the Fine Arts , 2005 .

[19]  Peter Dayan,et al.  Structure in the Space of Value Functions , 2002, Machine Learning.

[20]  David C. Plaut,et al.  The emergence of phonology from the interplay of speech comprehension and production ; A distributed connectionist approach , 1998 .

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  R. Miall,et al.  Connecting mirror neurons and forward models. , 2003, Neuroreport.

[23]  Yiannis Demiris,et al.  Learning Forward Models for Robots , 2005, IJCAI.

[24]  P. R. Davidson,et al.  Widespread access to predictive models in the motor system: a short review , 2005, Journal of neural engineering.

[25]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .