Instrumental Conditioning Driven by Apparently Neutral Stimuli: A Model Tested with a Simulated Robotic Rat

Current models of reinforcement learning are based on the assumption that learning must be guided by rewarding (unconditioned) stimuli. On the other hand, there is empirical evidence that dopamine bursts, which are commonly considered as the reinforcement learning signals, can also be triggered by apparently neutral stimuli, and that this can lead to conditioning phenomena in absence of any rewarding stimuli. In this paper we present a computational model, based on an hypothesis proposed in Redgrave and Gurney (2006), in which dopamine release is directly triggered by the superior colliculus (a dorsal midbrain structure) when it detects novel visual stimuli and this supports instrumental conditioning. The model incorporates various biological constraints, for example the anatomical and physiological data related to the micro-architecture of the superior colliculus presented in Binns and Salt (1997). The model is validated by reproducing with a simulated robotic rat the results of an experiment with real rats on the role of intrinsically reinforcing properties of apparently neutral stimuli reported in Reed et al. (1996).

[1]  E. Thorndike “Animal Intelligence” , 1898, Nature.

[2]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[3]  Gianluca Baldassarre,et al.  The Role of Amygdala in Devaluation : A Model Tested with a Simulated Rat , 2007 .

[4]  Stewart W. Wilson,et al.  A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[5]  P. Redgrave,et al.  A direct projection from superior colliculus to substantia nigra pars compacta in the cat , 2006, Neuroscience.

[6]  G. Baldassarre,et al.  Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[7]  T. Nokes,et al.  Intrinsic reinforcing properties of putatively neutral stimuli in an instrumental two-lever discrimination task , 1996 .

[8]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[9]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[10]  A. Dickinson,et al.  Neuronal coding of prediction errors. , 2000, Annual review of neuroscience.

[11]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[12]  R. Claire-Smith,et al.  Response preconditioning effects. , 1983 .

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  J. Mayhew,et al.  How Visual Stimuli Activate Dopaminergic Neurons at Short Latency , 2005, Science.

[15]  Francesco Mannella,et al.  A Computational Model of the Amygdala Nuclei's Role in Second Order Conditioning , 2008, SAB.

[16]  M. Wallace,et al.  Multisensory integration in the superior colliculus of the alert cat. , 1998, Journal of neurophysiology.

[17]  Christian Balkenius,et al.  EMOTIONAL LEARNING: A COMPUTATIONAL MODEL OF THE AMYGDALA , 2001, Cybern. Syst..

[18]  Peter Redgrave,et al.  Phasic activation of substantia nigra and the ventral tegmental area by chemical stimulation of the superior colliculus: an electrophysiological investigation in the rat , 2003, The European journal of neuroscience.

[19]  Release from GABA[A] receptor-mediated inhibition unmasks interlaminar connection within superior colliculus in anesthetized adult rats , 2003 .

[20]  T. Isa,et al.  Release from GABAA receptor-mediated inhibition unmasks interlaminar connection within superior colliculus in anesthetized adult rats , 2003, Neuroscience Research.

[21]  G. Baldassarre,et al.  Modelling Perception with Artificial Neural Networks: The interplay of Pavlovian and instrumental processes in devaluation experiments: a computational embodied neuroscience model tested with a simulated rat , 2010 .

[22]  T. Salt,et al.  Different roles for GABAA and GABAB receptors in visual processing in the rat superior colliculus , 1997, The Journal of physiology.

[23]  J. E. Albano,et al.  Visual-motor function of the primate superior colliculus. , 1980, Annual review of neuroscience.

[24]  H. Yin,et al.  The role of the basal ganglia in habit formation , 2006, Nature Reviews Neuroscience.

[25]  Peter Redgrave,et al.  A direct projection from superior colliculus to substantia nigra for detecting salient visual events , 2003, Nature Neuroscience.

[26]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[27]  James L. McClelland,et al.  The time course of perceptual choice: the leaky, competing accumulator model. , 2001, Psychological review.

[28]  Thomas E. Hazy,et al.  PVLV: the primary value and learned value Pavlovian learning algorithm. , 2007, Behavioral neuroscience.

[29]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.