Symbol emergence by combining a reinforcement learning schema model with asymmetric synaptic plasticity

A novel integrative learning architecture, RLSM with a STDP network is described. This architecture models symbol emergence in an autonomous agent engaged in reinforcement learning tasks. The architecture consists of two constitutional learning architectures: a reinforcement learning schema model (RLSM) and a spike timing-dependent plasticity (STDP) network. RLSM is an incremental modular reinforcement learning architecture. It makes an autonomous agent acquire behavioral concepts incrementally through continuous interactions with its environment and/or caregivers. STDP is a learning rule of neuronal plasticity that is found in cerebral cortices and the hippocampus. STDP is a temporally asymmetric learning rule that contrasts with the Hebbian learning rule. We found that STDP enables an autonomous robot to associate auditory input with its obtained behavioral concepts and to select reinforcement learning modules more effectively. Auditory signals that are interpreted based on obtained behavioral concepts are revealed to correspond to “signs” in Peirce’s semiotic triad. This integrative learning architecture is evaluated in the context of modular learning.

[1]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[2]  Yutaka Sakai,et al.  Synaptic regulation on various STDP rules , 2004, Neurocomputing.

[3]  L. Abbott,et al.  Synaptic plasticity: taming the beast , 2000, Nature Neuroscience.

[4]  L. Abbott,et al.  Competitive Hebbian learning through spike-timing-dependent synaptic plasticity , 2000, Nature Neuroscience.

[5]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[6]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[7]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[8]  Y. Takahashi,et al.  Lexicon Acquisition based on Behavior Learning , 2005, Proceedings. The 4nd International Conference on Development and Learning, 2005..

[9]  Tetsuo Sawaragi,et al.  Self-organization of inner symbols for chase: symbol organization and embodiment , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[10]  Daniel Chandler,et al.  Semiotics: The Basics , 2001 .

[11]  T. Sawaragi,et al.  Design and performance of symbols self-organized within an autonomous agent interacting with varied environments , 2004, RO-MAN 2004. 13th IEEE International Workshop on Robot and Human Interactive Communication (IEEE Catalog No.04TH8759).

[12]  H. Markram,et al.  Regulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs , 1997, Science.

[13]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.