Estimating the hidden learning representations

Successful adaptation relies on the ability to learn the consequence of our actions in different environments. However, understanding the neural bases of this ability still represents one of the great challenges of system neuroscience. In fact, the neuronal plasticity changes occurring during learning cannot be fully controlled experimentally and their evolution is hidden. Our approach is to provide hypotheses about the structure and dynamics of the hidden plasticity changes using behavioral learning theory. In fact, behavioral models of animal learning provide testable predictions about the hidden learning representations by formalizing their relation with the observables of the experiment (stimuli, actions and outcomes). Thus, we can understand whether and how the predicted learning processes are represented at the neural level by estimating their evolution and correlating them with neural data. Here, we present a bayesian model approach to estimate the evolution of the internal learning representations from the observations of the experiment (state estimation), and to identify the set of models' parameters (parameter estimation) and the class of behavioral model (model selection) that are most likely to have generated a given sequence of actions and outcomes. More precisely, we use Sequential Monte Carlo methods for state estimation and the maximum likelihood principle (MLP) for model selection and parameter estimation. We show that the method recovers simulated trajectories of learning sessions on a single-trial basis and provides predictions about the activity of different categories of neurons that should participate in the learning process. By correlating the estimated evolutions of the learning variables, we will be able to test the validity of different models of instrumental learning and possibly identify the neural bases of learning.

[1]  B. Balleine,et al.  Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[2]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[3]  Kenji Doya,et al.  Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter , 2003, NIPS.

[4]  J. Hollerman,et al.  Modifications of reward expectation-related neuronal activity during learning in primate striatum. , 1998, Journal of neurophysiology.

[5]  S. Wise,et al.  Comparison of learning‐related neuronal activity in the dorsal premotor cortex and striatum , 2004, The European journal of neuroscience.

[6]  Robert A. Rescorla,et al.  Associative Relations in Instrumental Learning: The Eighteenth Bartlett Memorial Lecture , 1991 .

[7]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[8]  D. Boussaoud,et al.  Neuronal activity in the monkey striatum during conditional visuomotor learning , 2003, Experimental Brain Research.

[9]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[10]  L. Frank,et al.  Single Neurons in the Monkey Hippocampus and Learning of New Associations , 2003, Science.

[11]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[12]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[13]  S. Wise,et al.  10 The Arbitrary Mapping of Sensory Inputs to Voluntary and Involuntary Movement: Learning-Dependent Activity in the Motor Cortex and Other Telencephalic Networks , 2005 .

[14]  W. Schultz,et al.  Modifications of reward expectation-related neuronal activity during learning in primate orbitofrontal cortex. , 2000, Journal of neurophysiology.

[15]  E. Miller,et al.  Different time courses of learning-related activity in the prefrontal cortex and striatum , 2005, Nature.

[16]  Ziv M. Williams,et al.  Selective enhancement of associative learning by microstimulation of the anterior caudate , 2006, Nature Neuroscience.

[17]  N. Mackintosh A Theory of Attention: Variations in the Associability of Stimuli with Reinforcement , 1975 .

[18]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[19]  S. Lea,et al.  Contemporary Animal Learning Theory, Anthony Dickinson. Cambridge University Press, Cambridge (1981), xii, +177 pp. £12.50 hardback, £3.95 paperback , 1981 .

[20]  S. Wise,et al.  Neuronal activity in the supplementary eye field during acquisition of conditional oculomotor associations. , 1995, Journal of neurophysiology.

[21]  E. Miller,et al.  Neural Activity in the Primate Prefrontal Cortex during Associative Learning , 1998, Neuron.

[22]  S. Wise,et al.  Supplementary eye field contrasted with the frontal eye field during acquisition of conditional oculomotor associations. , 1995, Journal of neurophysiology.

[23]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[24]  I. J. Myung,et al.  Tutorial on maximum likelihood estimation , 2003 .

[25]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[26]  Emery N Brown,et al.  Behavioral and neurophysiological analyses of dynamic learning processes. , 2005, Behavioral and cognitive neuroscience reviews.

[27]  J. Pearce,et al.  Theories of associative learning in animals. , 2001, Annual review of psychology.

[28]  J. Pearce,et al.  A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980, Psychological review.

[29]  W. Schultz Behavioral theories and the neurophysiology of reward. , 2006, Annual review of psychology.

[30]  C. N Bouza,et al.  Spall, J.C. Introduction to stochastic search and optimization. Estimation, simulation and control. Wiley Interscience Series in Discrete Mathematics and Optimization, 2003 , 2004 .

[31]  R. Herrnstein On the law of effect. , 1970, Journal of the experimental analysis of behavior.

[32]  Tim Hesterberg,et al.  Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[33]  E. Rolls,et al.  Modification of the responses of hippocampal neurons in the monkey during the learning of a conditional spatial response task , 1993, Hippocampus.

[34]  S. Wise,et al.  Arbitrary associations between antecedents and actions , 2000, Trends in Neurosciences.

[35]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[36]  W. Newsome,et al.  Choosing the greater of two goods: neural currencies for valuation and decision making , 2005, Nature Reviews Neuroscience.

[37]  A. Dickinson,et al.  Neuronal coding of prediction errors. , 2000, Annual review of neuroscience.

[38]  R J HERRNSTEIN,et al.  Relative and absolute strength of response as a function of frequency of reinforcement. , 1961, Journal of the experimental analysis of behavior.