The Successor Representation as a model of behavioural flexibility

Accounting for behavioural capabilities and flexibilities experimentally observed in animals is a major issue in computational neurosciences. In order to design a comprehensive algorithmic framework for this purpose, the model-free and model-based reinforcement learning (RL) components are generally taken as reference either in isolation or in combination. In this article, we consider the RL Successor Representation (SR) approach as an alternative. We compare it to the standard model-free and modelbased models on three relevant experimental data-sets. These modelling experiments demonstrate that SR is able to account better for several behavioural flexibilities while being algorithmically simpler.

[1]  Giovanni Pezzulo,et al.  The Mixed Instrumental Controller: Using Value of Information to Combine Habitual Choice and Mental Simulation , 2013, Front. Psychol..

[2]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[3]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[4]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[5]  John R. Anderson,et al.  Navigating complex decision spaces: Problems and paradigms in sequential choice. , 2014, Psychological bulletin.

[6]  P. Dayan The Convergence of TD(λ) for General λ , 2004, Machine Learning.

[7]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[8]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[9]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[10]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[11]  Kevin A. Gluck,et al.  SAwSu: An Integrated Model of Associative and Reinforcement Learning , 2014, Cogn. Sci..

[12]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[13]  Samuel Gershman,et al.  Deep Successor Reinforcement Learning , 2016, ArXiv.

[14]  Per B. Sederberg,et al.  The Successor Representation and Temporal Context , 2012, Neural Computation.

[15]  Peter Dayan,et al.  Motivated Reinforcement Learning , 2001, NIPS.

[16]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[17]  Samuel J. Gershman,et al.  Predictive representations can link model-based reinforcement learning to model-free mechanisms , 2017 .

[18]  Shinsuke Shimojo,et al.  Neural Computations Underlying Arbitration between Model-Based and Model-free Learning , 2013, Neuron.

[19]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[20]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[21]  Samuel Gershman,et al.  Design Principles of the Hippocampal Cognitive Map , 2014, NIPS.

[22]  C. L. Hull Differential habituation to internal stimuli in the albino rat. , 1933 .

[23]  P. Dayan,et al.  Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.

[24]  Joel L. Davis,et al.  Adaptive Critics and the Basal Ganglia , 1995 .

[25]  A. Markman,et al.  Journal of Experimental Psychology : General Retrospective Revaluation in Sequential Decision Making : A Tale of Two Systems , 2012 .

[26]  R. Leeper The Rôle of Motivation in Learning: A Study of the Phenomenon of Differential Motivational Control of the Utilization of Habits , 1935 .

[27]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[28]  C. H. Honzik,et al.  Degrees of hunger, reward and non-reward, and maze learning in rats, and Introduction and removal of reward, and maze performance in rats , 1930 .

[29]  Kae Nakamura,et al.  Predictive Reward Signal of Dopamine Neurons , 2015 .

[30]  M. Botvinick,et al.  The successor representation in human reinforcement learning , 2016, bioRxiv.

[31]  W. Schultz Multiple dopamine functions at different time courses. , 2007, Annual review of neuroscience.

[32]  B. Balleine,et al.  Multiple Forms of Value Learning and the Function of Dopamine , 2009 .

[33]  M. Khamassi,et al.  Accounting for Negative Automaintenance in Pigeons: A Dual Learning Systems Approach and Factored Representations , 2014, PloS one.

[34]  A. Christopoulos,et al.  Fitting Models to Biological Data Using Linear and Nonlinear Regression: A Practical Guide to Curve Fitting , 2004 .

[35]  K. Spence,et al.  An experimental test of the sign-gestalt theory of trial and error learning. , 1946 .

[36]  Olivier Sigaud,et al.  Processus décisionnels de Markov en intelligence artificielle , 2008 .

[37]  H. Blodgett,et al.  The effect of the introduction of reward upon the maze performance of rats , 1929 .

[38]  Michael H. Herzog,et al.  What to Choose Next? A Paradigm for Testing Human Sequential Decision Making , 2017, Front. Psychol..

[39]  B. Reynolds,et al.  A repetition of the Blodgett experiment of latent learning. , 1945, Journal of experimental psychology.

[40]  E. Koechlin Prefrontal executive function and adaptive behavior in complex environments , 2016, Current Opinion in Neurobiology.

[41]  P. Dayan,et al.  The algorithmic anatomy of model-based evaluation , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[42]  E. Thorndike Animal intelligence; experimental studies, by Edward L. Thorndike. , 1911 .

[43]  Mehdi Khamassi,et al.  Modelling Individual Differences in the Form of Pavlovian Conditioned Approach Responses: A Dual Learning Systems Approach with Factored Representations , 2014, PLoS Comput. Biol..

[44]  R. Dolan,et al.  Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making , 2015, Proceedings of the National Academy of Sciences.

[45]  H. Boland,et al.  Experimental Investigation of ''Latent Learning'' in Mice , 1991 .

[46]  Anne G E Collins,et al.  Surprise! Dopamine signals mix action, value and error , 2015, Nature Neuroscience.

[47]  E. Tolman There is more than one kind of learning. , 1949, Psychological review.