In Search of the Neural Circuits of Intrinsic Motivation

Children seem to acquire new know-how in a continuous and open-ended manner. In this paper, we hypothesize that an intrinsic motivation to progress in learning is at the origins of the remarkable structure of children's developmental trajectories. In this view, children engage in exploratory and playful activities for their own sake, not as steps toward other extrinsic goals. The central hypothesis of this paper is that intrinsically motivating activities correspond to expected decrease in prediction error. This motivation system pushes the infant to avoid both predictable and unpredictable situations in order to focus on the ones that are expected to maximize progress in learning. Based on a computational model and a series of robotic experiments, we show how this principle can lead to organized sequences of behavior of increasing complexity characteristic of several behavioral and developmental patterns observed in humans. We then discuss the putative circuitry underlying such an intrinsic motivation system in the brain and formulate two novel hypotheses. The first one is that tonic dopamine acts as a learning progress signal. The second is that this progress signal is directly computed through a hierarchy of microcortical circuits that act both as prediction and metaprediction systems.

[1]  E. Thorndike Animal intelligence; experimental studies, by Edward L. Thorndike. , 1911 .

[2]  C. L. Hull Principles of behavior : an introduction to behavior theory , 1943 .

[3]  N. Wiener,et al.  Behavior, Purpose and Teleology , 1943, Philosophy of Science.

[4]  H. Harlow Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950, Journal of comparative and physiological psychology.

[5]  Harlow Hf Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950 .

[6]  K. Montgomery The role of the exploratory drive in learning. , 1954, Journal of comparative and physiological psychology.

[7]  J. Wilder The Origins of Intelligence in Children , 1954 .

[8]  D. Hebb Drives and the C.N.S. (conceptual nervous system). , 1955, Psychological review.

[9]  W. N. Dember,et al.  Analysis of exploratory, manipulatory, and curiosity behaviors. , 1957, Psychological review.

[10]  L. Festinger,et al.  A Theory of Cognitive Dissonance , 2017 .

[11]  R. W. White Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[12]  R. Heath ELECTRICAL SELF-STIMULATION OF THE BRAIN IN MAN. , 1963, The American journal of psychiatry.

[13]  J. Bruner,et al.  On Knowing: Essays for the Left Hand. , 1965 .

[14]  R. Decharms Personal causation : the internal affective determinants of behavior , 1968 .

[15]  J. Piaget,et al.  The Origins of Intelligence in Children , 1971 .

[16]  J. Kagan Motives and development. , 1972, Journal of personality and social psychology.

[17]  K. Jellinger,et al.  Brain dopamine and the syndromes of Parkinson and Huntington. Clinical, morphological and neurochemical correlations. , 1973, Journal of the neurological sciences.

[18]  D. Bell The experimental reproduction of amphetamine psychosis. , 1973, Archives of general psychiatry.

[19]  E. Rolls The brain and reward , 1975 .

[20]  V. Mountcastle,et al.  An organizing principle for cerebral function : the unit module and the distributed system , 1978 .

[21]  J. Gray,et al.  Précis of The neuropsychology of anxiety: An enquiry into the functions of the septo-hippocampal system , 1982, Behavioral and Brain Sciences.

[22]  R. D. Charms Personal Causation: The Internal Affective Determinants of Behavior , 1983 .

[23]  R. Oades The role of noradrenaline in tuning and dopamine in switching between signals in the CNS , 1985, Neuroscience & Biobehavioral Reviews.

[24]  J. Stellar,et al.  The Neurobiology of Motivation and Reward , 1985 .

[25]  Edward L. Deci,et al.  Intrinsic Motivation and Self-Determination in Human Behavior , 1975, Perspectives in Social Psychology.

[26]  D. Weinberger Implications of normal brain development for the pathogenesis of schizophrenia. , 1987, Archives of general psychiatry.

[27]  Gordon M. Shepherd,et al.  A basic circuit of cortical organization. , 1988 .

[28]  G. Chiara,et al.  Amphetamine, cocaine, phencyclidine and nomifensine increase extracellular dopamine concentrations preferentially in the nucleus accumbens of freely moving rats , 1989, Neuroscience.

[29]  W. Iacono,et al.  Neurobehavioral aspects of affective disorders. , 1989, Annual review of psychology.

[30]  J. B. Justice,et al.  Dopamine in the nucleus accumbens during cocaine self-administration as studied by in vivo microdialysis , 1989, Pharmacology Biochemistry and Behavior.

[31]  N. White Reward or reinforcement: What's the difference? , 1989, Neuroscience & Biobehavioral Reviews.

[32]  J. Gray,et al.  Brain systems that mediate both emotion and cognition. , 1990 .

[33]  M. Csíkszentmihályi Flow: The Psychology of Optimal Experience , 1990 .

[34]  A. Grace Phasic versus tonic dopamine release and the modulation of dopamine system responsivity: A hypothesis for the etiology of schizophrenia , 1991, Neuroscience.

[35]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[36]  Philip H. Mirvis Flow: The Psychology of Optimal Experience , 1991 .

[37]  K. Yoshimoto,et al.  Alcohol stimulates the release of dopamine and serotonin in the nucleus accumbens. , 1992, Alcohol.

[38]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[39]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[40]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[41]  L. Schauble,et al.  Beyond Modularity: A Developmental Perspective on Cognitive Science. , 1994 .

[42]  P. Kalivas,et al.  Involvement of dopamine and excitatory amino acid transmission in novelty-induced motor activity. , 1994, The Journal of pharmacology and experimental therapeutics.

[43]  A. Karmiloff-Smith Précis of Beyond modularity: A developmental perspective on cognitive science , 1994, Behavioral and Brain Sciences.

[44]  O. Hikosaka Models of information processing in the basal Ganglia edited by James C. Houk, Joel L. Davis and David G. Beiser, The MIT Press, 1995. $60.00 (400 pp) ISBN 0 262 08234 9 , 1995, Trends in Neurosciences.

[45]  Linda B. Smith,et al.  A Dynamic Systems Approach to the Development of Cognition and Action , 2007, Journal of Cognitive Neuroscience.

[46]  A. Barto,et al.  Adaptive Critics and the Basal Ganglia , 1994 .

[47]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[48]  B. Vereijken A dynamic systems approach to the development of cognition and action: E. Thelen and L.B. Smith, MIT Press, Cambridge, MA, 1994. Pp. 376. ISBN 0-262-20095-3 , 1996 .

[49]  L. Steels,et al.  Grounding adaptive language games in robotic agents , 2006, AAAI 2012.

[50]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[51]  Kenneth W. Bauer,et al.  Selecting Optimal Experiments for Multiple Output Multilayer Perceptrons , 1997, Neural Computation.

[52]  Jean-Arcady Meyer,et al.  Learning to Perceive the World as Articulated: An Approach for Hierarchical Learning in Sensory-Motor Systems , 1998 .

[53]  E. Rolls The Brain and Emotion , 2000 .

[54]  D. Brooks,et al.  Evidence for striatal dopamine release during a video game , 1998, Nature.

[55]  J. Panksepp Affective Neuroscience: The Foundations of Human and Animal Emotions , 1998 .

[56]  Mitsuo Kawato,et al.  Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.

[57]  P. Redgrave,et al.  Is the short-latency dopamine response too short to signal reward error? , 1999, Trends in Neurosciences.

[58]  Stefano Nolfi,et al.  Learning to perceive the world as articulated: an approach for hierarchical learning in sensory-motor systems , 1998, Neural Networks.

[59]  Kenji Doya,et al.  What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[60]  S. Ikemoto,et al.  The role of nucleus accumbens dopamine in motivated behavior: a unifying interpretation with special reference to reward-seeking , 1999, Brain Research Reviews.

[61]  P. Matthews,et al.  Learning about pain: the neural substrate of the prediction error for aversive events. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[62]  E. Deci,et al.  Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[63]  J. Horvitz Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events , 2000, Neuroscience.

[64]  Herbert Jaeger,et al.  The''echo state''approach to analysing and training recurrent neural networks , 2001 .

[65]  Roland E. Suri,et al.  Temporal Difference Model Reproduces Anticipatory Neural Activity , 2001, Neural Computation.

[66]  S. Vereza Philosophy in the flesh: the embodied mind and its challenge to Western thought , 2001 .

[67]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[68]  Sham M. Kakade,et al.  Opponent interactions between serotonin and dopamine , 2002, Neural Networks.

[69]  Roberto Cordeschi,et al.  The Discovery of the Artificial. Behavior, Mind and Machines Before and Beyond Cybernetics , 2010, Studies in Cognitive Systems.

[70]  Xiao Huang,et al.  Novelty and Reinforcement Learning in the Value System of Developmental Robots , 2002 .

[71]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[72]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[73]  D. Joel,et al.  Dopamine in Schizophrenia Dysfunctional Information Processing in Basal Ganglia — Thalamocortical Split Circuits , 2002 .

[74]  J. Horvitz Dopamine gating of glutamatergic sensorimotor and incentive motivational input signals to the striatum , 2002, Behavioural Brain Research.

[75]  Peter Dayan,et al.  Dopamine: generalization and bonuses , 2002, Neural Networks.

[76]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[77]  J. Cameron,et al.  Rewards and Intrinsic Motivation: Resolving the Controversy , 2002 .

[78]  Gianluca Baldassarre,et al.  A modular neural-network model of the basal ganglia’s role in learning and selecting motor behaviours , 2002, Cognitive Systems Research.

[79]  Luc Steels,et al.  The Autotelic Principle , 2003, Embodied Artificial Intelligence.

[80]  Pierre-Yves Oudeyer,et al.  Maximizing Learning Progress: An Internal Reward System for Development , 2003, Embodied Artificial Intelligence.

[81]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[82]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[83]  Terrence J. Sejnowski,et al.  Exploration Bonuses and Dual Control , 1996, Machine Learning.

[84]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[85]  C. Fiorillo The uncertain nature of dopamine , 2004, Molecular Psychiatry.

[86]  Douglas S. Blank,et al.  An Emergent Framework For Self-Motivation In Developmental Robotics , 2004 .

[87]  F. Quaade,et al.  Stereotaxic stimulation and electrocoagulation of the lateral hypothalamus in obese humans , 2005, Acta Neurochirurgica.

[88]  Bruno Galantucci,et al.  An Experimental Study of the Emergence of Human Communication Systems , 2005, Cogn. Sci..

[89]  Mehdi Khamassi,et al.  Actor–Critic Models of Reinforcement Learning in the Basal Ganglia: From Natural to Artificial Rats , 2005, Adapt. Behav..

[90]  J. Mayhew,et al.  How Visual Stimuli Activate Dopaminergic Neurons at Short Latency , 2005, Science.

[91]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[92]  G. Roth,et al.  Evolution of the brain and intelligence , 2005, Trends in Cognitive Sciences.

[93]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[94]  K. Berridge The debate over dopamine’s role in reward: the case for incentive salience , 2007, Psychopharmacology.

[95]  S. Kapur,et al.  Dopamine, prediction error and associative learning: A model-based account , 2006, Network.

[96]  Pierre-Yves Oudeyer,et al.  Discovering communication , 2006, Connect. Sci..

[97]  W. Schultz Behavioral theories and the neurophysiology of reward. , 2006, Annual review of psychology.

[98]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[99]  Frédéric Kaplan,et al.  Un robot motivé pour apprendre : Le rôle des motivations intrinsèques dans le développement sensorimoteur , 2007 .

[100]  Pierre-Yves Oudeyer,et al.  The progress drive hypothesis: an interpretation of early imitation , 2007 .

[101]  Alexandre Pouget,et al.  Optimal Sensorimotor Integration in Recurrent Cortical Networks: A Neural Implementation of Kalman Filters , 2007, The Journal of Neuroscience.

[102]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[103]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[104]  E. Thorndike Animal Intelligence; Experimental Studies , 2009 .

[105]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.