Intrinsically motivated action-outcome learning and goal-based action recall: a system-level bio-constrained computational model.

Reinforcement (trial-and-error) learning in animals is driven by a multitude of processes. Most animals have evolved several sophisticated systems of 'extrinsic motivations' (EMs) that guide them to acquire behaviours allowing them to maintain their bodies, defend against threat, and reproduce. Animals have also evolved various systems of 'intrinsic motivations' (IMs) that allow them to acquire actions in the absence of extrinsic rewards. These actions are used later to pursue such rewards when they become available. Intrinsic motivations have been studied in Psychology for many decades and their biological substrates are now being elucidated by neuroscientists. In the last two decades, investigators in computational modelling, robotics and machine learning have proposed various mechanisms that capture certain aspects of IMs. However, we still lack models of IMs that attempt to integrate all key aspects of intrinsically motivated learning and behaviour while taking into account the relevant neurobiological constraints. This paper proposes a bio-constrained system-level model that contributes a major step towards this integration. The model focusses on three processes related to IMs and on the neural mechanisms underlying them: (a) the acquisition of action-outcome associations (internal models of the agent-environment interaction) driven by phasic dopamine signals caused by sudden, unexpected changes in the environment; (b) the transient focussing of visual gaze and actions on salient portions of the environment; (c) the subsequent recall of actions to pursue extrinsic rewards based on goal-directed reactivation of the representations of their outcomes. The tests of the model, including a series of selective lesions, show how the focussing processes lead to a faster learning of action-outcome associations, and how these associations can be recruited for accomplishing goal-directed behaviours. The model, together with the background knowledge reviewed in the paper, represents a framework that can be used to guide the design and interpretation of empirical experiments on IMs, and to computationally validate and further develop theories on them.

[1]  Benjamin O. Turner,et al.  Cortical and basal ganglia contributions to habit learning and automaticity , 2010, Trends in Cognitive Sciences.

[2]  Peter Dayan,et al.  Dopamine: generalization and bonuses , 2002, Neural Networks.

[3]  G. Rizzolatti,et al.  The Organization of the Frontal Motor Cortex. , 2000, News in physiological sciences : an international journal of physiology produced jointly by the International Union of Physiological Sciences and the American Physiological Society.

[4]  D. Pandya,et al.  The cortical connectivity of the prefrontal cortex in the monkey brain , 2012, Cortex.

[5]  D L Sparks,et al.  Translation of sensory signals into commands for control of saccadic eye movements: role of primate superior colliculus. , 1986, Physiological reviews.

[6]  Joshua W. Brown,et al.  How the Basal Ganglia Use Parallel Excitatory and Inhibitory Learning Pathways to Selectively Respond to Unexpected Rewarding Cues , 1999, The Journal of Neuroscience.

[7]  Gianluca Baldassarre,et al.  What are intrinsic motivations? A biological perspective , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[8]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[9]  Peter Redgrave,et al.  A computational model of action selection in the basal ganglia. II. Analysis and simulation of behaviour , 2001, Biological Cybernetics.

[10]  P. Redgrave,et al.  The basal ganglia: a vertebrate solution to the selection problem? , 1999, Neuroscience.

[11]  M. West,et al.  Loss of Lever Press-Related Firing of Rat Striatal Forelimb Neurons after Repeated Sessions in a Lever Pressing Task , 1997, The Journal of Neuroscience.

[12]  G. Heit,et al.  Somatotopy in the basal ganglia: experimental and clinical evidence for segregated sensorimotor channels , 2005, Brain Research Reviews.

[13]  J. Crabtree,et al.  New Intrathalamic Pathways Allowing Modality-Related and Cross-Modality Switching in the Dorsal Thalamus , 2002, The Journal of Neuroscience.

[14]  S. Sara The locus coeruleus and noradrenergic modulation of cognition , 2009, Nature Reviews Neuroscience.

[15]  Thomas P. Trappenberg,et al.  Fundamentals of Computational Neuroscience (2. ed.) , 2002 .

[16]  B. Balleine,et al.  Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[17]  Wolfgang Banzhaf,et al.  Advances in Artificial Life , 2003, Lecture Notes in Computer Science.

[18]  H. Harlow Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950, Journal of comparative and physiological psychology.

[19]  C. Malsburg,et al.  How patterned neural connections can be set up by self-organization , 1976, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[20]  B. Everitt,et al.  Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex , 2002, Neuroscience & Biobehavioral Reviews.

[21]  Kae Nakamura,et al.  Predictive Reward Signal of Dopamine Neurons , 2015 .

[22]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[23]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[24]  S. Haber The primate basal ganglia: parallel and integrative networks , 2003, Journal of Chemical Neuroanatomy.

[25]  J. Fuster Prefrontal Cortex , 2018 .

[26]  W. Schultz Dopamine signals for reward value and risk: basic and recent data , 2010, Behavioral and Brain Functions.

[27]  Peter Redgrave,et al.  A computational model of action selection in the basal ganglia. I. A new functional anatomy , 2001, Biological Cybernetics.

[28]  H. Yin,et al.  The role of the basal ganglia in habit formation , 2006, Nature Reviews Neuroscience.

[29]  D. Parisi,et al.  TRoPICALS: a computational embodied neuroscience model of compatibility effects. , 2010, Psychological review.

[30]  Domenico Formica,et al.  A mechatronic platform for behavioral analysis on nonhuman primates. , 2012, Journal of integrative neuroscience.

[31]  J. E. Albano,et al.  Visual-motor function of the primate superior colliculus. , 1980, Annual review of neuroscience.

[32]  O. Hikosaka,et al.  Role of the basal ganglia in the control of purposive saccadic eye movements. , 2000, Physiological reviews.

[33]  G. Baldassarre,et al.  Modelling Perception with Artificial Neural Networks: The interplay of Pavlovian and instrumental processes in devaluation experiments: a computational embodied neuroscience model tested with a simulated rat , 2010 .

[34]  M. Jeannerod Visuomotor channels: Their integration in goal-directed prehension , 1999 .

[35]  Francesco Mannella,et al.  The roles of the amygdala in the affective regulation of body, brain, and behaviour , 2010, Connect. Sci..

[36]  J. W. Aldridge,et al.  Primate basal ganglia activity in a precued reaching task: preparation for movement , 2004, Experimental Brain Research.

[37]  P. Redgrave,et al.  Cortico-striatal plasticity for action-outcome learning using spike timing dependent eligibility , 2009, BMC Neuroscience.

[38]  P. Strick,et al.  Basal-ganglia 'projections' to the prefrontal cortex of the primate. , 2002, Cerebral cortex.

[39]  Gordon M. Shepherd,et al.  Handbook of Brain Microcircuits , 2010 .

[40]  J. Wickens,et al.  Neural mechanisms of reward-related motor learning , 2003, Current Opinion in Neurobiology.

[41]  J. Kalaska,et al.  Neural mechanisms for interacting with a world full of action choices. , 2010, Annual review of neuroscience.

[42]  C. Hofsten Eye–hand coordination in the newborn. , 1982 .

[43]  J. Wickens,et al.  Computational models of the basal ganglia: from robots to membranes , 2004, Trends in Neurosciences.

[44]  A. Nambu,et al.  Functional significance of the cortico–subthalamo–pallidal ‘hyperdirect’ pathway , 2002, Neuroscience Research.

[45]  K. Berridge,et al.  What is the role of dopamine in reward: hedonic impact, reward learning, or incentive salience? , 1998, Brain Research Reviews.

[46]  Thomas J. Anastasio Tutorial on Neural Systems Modeling , 2009 .

[47]  T. Robbins,et al.  The hippocampal–striatal axis in learning, prediction and goal-directed behavior , 2011, Trends in Neurosciences.

[48]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[49]  Shelby Montague Tutorial on Neural Systems Modeling , 2011, The Yale Journal of Biology and Medicine.

[50]  F. Crépel,et al.  Dopaminergic modulation of long-term synaptic plasticity in rat prefrontal neurons. , 2003, Cerebral cortex.

[51]  Gianluca Baldassarre,et al.  Forward and Bidirectional Planning Based on Reinforcement Learning and Neural Networks in a Simulated Robot , 2003, ABiALS.

[52]  Kevin Gurney,et al.  Action Discovery and Intrinsic Motivation: A Biologically Constrained Formalisation , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[53]  G. Baldassarre,et al.  Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[54]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[55]  Jürgen Schmidhuber,et al.  Artificial curiosity based on discovering novel algorithmic predictability through coevolution , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[56]  J. Wallis Orbitofrontal cortex and its contribution to decision-making. , 2007, Annual review of neuroscience.

[57]  S. Paradiso,et al.  Book Review: Affective Neuroscience: The Foundations of Human and Animal Emotions , 2000 .

[58]  Gianluca Baldassarre,et al.  Planning with neural networks and reinforcement learning , 2001 .

[59]  S. Dehaene,et al.  Topographical Layout of Hand, Eye, Calculation, and Language-Related Areas in the Human Parietal Lobe , 2002, Neuron.

[60]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[61]  R. W. White Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[62]  Francesco Mannella,et al.  The "Mechatronic Board": A Tool to Study Intrinsic Motivations in Humans, Monkeys, and Humanoid Robots , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[63]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[64]  R. Andersen,et al.  Intention-related activity in the posterior parietal cortex: a review , 2000, Vision Research.

[65]  Kenji Doya,et al.  What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[66]  Marco Mirolli,et al.  Evolving Childhood's Length and Learning Parameters in an Intrinsically Motivated Reinforcement Learning Robot , 2007 .

[67]  M. Goodale,et al.  Separate visual pathways for perception and action , 1992, Trends in Neurosciences.

[68]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[69]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[70]  Peter Redgrave,et al.  Tectonigral projections in the primate: a pathway for pre‐attentive sensory input to midbrain dopaminergic neurons , 2009, The European journal of neuroscience.

[71]  E. Kandel,et al.  Genetic evidence for the bidirectional modulation of synaptic plasticity in the prefrontal cortex by D1 receptors. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[72]  Y. Smith,et al.  The thalamostriatal system: a highly specific network of the basal ganglia circuitry , 2004, Trends in Neurosciences.

[73]  R. M. Elliott,et al.  Behavior of Organisms , 1991 .

[74]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[75]  Kevin Gurney,et al.  Dopamine-mediated action discovery promotes optimal behavior ‘for free’ , 2011, BMC Neuroscience.

[76]  J. Cowan,et al.  Excitatory and inhibitory interactions in localized populations of model neurons. , 1972, Biophysical journal.

[77]  S. Sesack,et al.  The inhibitory influence of the lateral habenula on midbrain dopamine cells: Ultrastructural evidence for indirect mediation via the rostromedial mesopontine tegmental nucleus , 2011, The Journal of comparative neurology.

[78]  Paul B. Johnson,et al.  Premotor and parietal cortex: corticocortical connectivity and combinatorial computations. , 1997, Annual review of neuroscience.

[79]  T. Robbins,et al.  Putting a spin on the dorsal–ventral divide of the striatum , 2004, Trends in Neurosciences.

[80]  John N. J. Reynolds,et al.  Dopamine-dependent plasticity of corticostriatal synapses , 2002, Neural Networks.

[81]  M D Humphries,et al.  The role of intra-thalamic and thalamocortical circuits in action selection , 2002, Network.

[82]  B. Balleine,et al.  The Effect of Lesions of the Basolateral Amygdala on Instrumental Conditioning , 2003, The Journal of Neuroscience.

[83]  Gurney Kevin The basal ganglia and the 3-factor learning rule: reinforcement learning during operant conditioning , 2009 .

[84]  Marco Mirolli,et al.  Functions and Mechanisms of Intrinsic Motivations , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[85]  D. Hansel,et al.  Competition between Feedback Loops Underlies Normal and Pathological Dynamics in the Basal Ganglia , 2022 .

[86]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[87]  Marco Mirolli,et al.  Intrinsically Motivated Learning in Natural and Artificial Systems , 2013 .

[88]  Harlow Hf Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950 .

[89]  Stephen Hart,et al.  Learning Generalizable Control Programs , 2011, IEEE Transactions on Autonomous Mental Development.

[90]  Marco Mirolli,et al.  Biological Cumulative Learning through Intrinsic Motivations: A Simulated Robotic Study on the Development of Visually-Guided Reaching , 2010, EpiRob.

[91]  S. Sara,et al.  Locus coeruleus-evoked responses in behaving rats: A clue to the role of noradrenaline in memory , 1994, Brain Research Bulletin.

[92]  J. Deniau,et al.  Disinhibition as a basic process in the expression of striatal functions , 1990, Trends in Neurosciences.

[93]  Marco Mirolli,et al.  Deciding Which Skill to Learn When: Temporal-Difference Competence-Based Intrinsic Motivation (TD-CB-IM) , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[94]  Hans-Georg Voss,et al.  Curiosity and Exploration: Theories and Results , 2013 .

[95]  Joseph E LeDoux,et al.  Organization of intra-amygdaloid circuitries in the rat: an emerging framework for understanding functions of the amygdala , 1997, Trends in Neurosciences.

[96]  P. Greengard,et al.  Dichotomous Dopaminergic Control of Striatal Synaptic Plasticity , 2008, Science.

[97]  G. B. Kish Learning when the onset of illumination is used as reinforcing stimulus. , 1955, Journal of comparative and physiological psychology.

[98]  K. Grill-Spector,et al.  The human visual cortex. , 2004, Annual review of neuroscience.

[99]  Paolo Calabresi,et al.  Dopamine-mediated regulation of corticostriatal synaptic plasticity , 2007, Trends in Neurosciences.

[100]  J. Lisman,et al.  The Hippocampal-VTA Loop: Controlling the Entry of Information into Long-Term Memory , 2005, Neuron.

[101]  E. Rolls,et al.  Neural networks and brain function , 1998 .

[102]  D. Kumaran,et al.  Which computational mechanisms operate in the hippocampus during novelty detection? , 2007, Hippocampus.

[103]  Xiao-Jing Wang,et al.  Erratum to: Effects of neuromodulation in a cortical network model of object working memory dominated by recurrent inhibition , 2014, Journal of Computational Neuroscience.

[104]  B. E. Eckbo,et al.  Appendix , 1826, Epilepsy Research.

[105]  G. Rizzolatti,et al.  Motor and cognitive functions of the ventral premotor cortex , 2002, Current Opinion in Neurobiology.

[106]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[107]  J. Wickens Synaptic plasticity in the basal ganglia , 2009, Behavioural Brain Research.

[108]  David L. Sparks,et al.  Sensori-motor integration in the primate superior colliculus , 1991 .

[109]  Kevin N. Gurney,et al.  Reverse Engineering the Vertebrate Brain: Methodological Principles for a Biologically Grounded Programme of Cognitive Modelling , 2009, Cognitive Computation.

[110]  Peter Redgrave,et al.  A direct projection from superior colliculus to substantia nigra for detecting salient visual events , 2003, Nature Neuroscience.

[111]  D. Sparks,et al.  Sensorimotor integration in the primate superior colliculus. I. Motor convergence. , 1987, Journal of neurophysiology.

[112]  Thomas P. Trappenberg,et al.  Fundamentals of Computational Neuroscience , 2002 .

[113]  B. Skinner,et al.  Principles of Behavior , 1944 .

[114]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[115]  Peter Dayan,et al.  Expected and Unexpected Uncertainty: ACh and NE in the Neocortex , 2002, NIPS.

[116]  J. Mayhew,et al.  How Visual Stimuli Activate Dopaminergic Neurons at Short Latency , 2005, Science.

[117]  G. Rizzolatti,et al.  Two different streams form the dorsal visual system: anatomy and functions , 2003, Experimental Brain Research.

[118]  P. Redgrave,et al.  Functional properties of the basal ganglia's re-entrant loop architecture: selection and reinforcement , 2011, Neuroscience.

[119]  Marco Mirolli,et al.  Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study , 2013, Neural Networks.

[120]  Leslie G. Ungerleider,et al.  Contribution of striate inputs to the visuospatial functions of parieto-preoccipital cortex in monkeys , 1982, Behavioural Brain Research.

[121]  D. A. Lieberman,et al.  Learning: Behavior and cognition , 1990 .

[122]  E. Miller,et al.  An integrative theory of prefrontal cortex function. , 2001, Annual review of neuroscience.

[123]  O. Hikosaka Models of information processing in the basal Ganglia edited by James C. Houk, Joel L. Davis and David G. Beiser, The MIT Press, 1995. $60.00 (400 pp) ISBN 0 262 08234 9 , 1995, Trends in Neurosciences.

[124]  Marco Mirolli,et al.  Evolution and Learning in an Intrinsically Motivated Reinforcement Learning Robot , 2007, ECAL.

[125]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[126]  S. Schultz Principles of Neural Science, 4th ed. , 2001 .

[127]  O Hikosaka,et al.  Neural systems for control of voluntary action--a hypothesis. , 1998, Advances in biophysics.

[128]  C. L. Hull Principles of Behavior , 1945 .

[129]  G. E. Alexander,et al.  Parallel organization of functionally segregated circuits linking basal ganglia and cortex. , 1986, Annual review of neuroscience.

[130]  R. Kötter,et al.  Connecting Mean Field Models of Neural Activity to EEG and fMRI Data , 2010, Brain Topography.

[131]  J. W. Aldridge,et al.  Sequential super-stereotypy of an instinctive fixed action pattern in hyper-dopaminergic mutant mice: a model of obsessive compulsive disorder and Tourette's , 2005, BMC Biology.