Hedonic value: enhancing adaptation for motivated agents

Reinforcement learning (RL) in the context of artificial agents is typically used to produce behavioral responses as a function of the reward obtained by interaction with the environment. When the problem consists of learning the shortest path to a goal, it is common to use reward functions yielding a fixed value after each decision, for example a positive value if the target location has been attained and a negative value at each intermediate step. However, this fixed strategy may be overly simplistic for agents to adapt to dynamic environments, in which resources may vary from time to time. By contrast, there is significant evidence that most living beings internally modulate reward value as a function of their context to expand their range of adaptivity. Inspired by the potential of this operation, we present a review of its underlying processes and we introduce a simplified formalization for artificial agents. The performance of this formalism is tested by monitoring the adaptation of an agent endowed with a model of motivated actor–critic, embedded with our formalization of value and constrained by physiological stability, to environments with different resource distribution. Our main result shows that the manner in which reward is internally processed as a function of the agent’s motivational state, strongly influences adaptivity of the behavioral cycles generated and the agent’s physiological stability.

[1]  Marc Toussaint,et al.  Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System , 2003, NIPS.

[2]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[3]  Michael Rotte,et al.  Favorite brands as cultural objects modulate reward circuit , 2007, Neuroreport.

[4]  Kyle S. Smith,et al.  Disentangling pleasure from incentive salience and learning signals in brain reward circuitry , 2011, Proceedings of the National Academy of Sciences.

[5]  W. Schultz,et al.  Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[6]  P. Dayan,et al.  Behavioral/systems/cognitive Action Dominates Valence in Anticipatory Representations in the Human Striatum and Dopaminergic Midbrain , 2010 .

[7]  Jean-Arcady Meyer,et al.  From Animals to Animats: Proceedings of the International Conference on Simulation of Adaptive Behavior (1st) Held in Paris, France on 24-28 September 1990 , 1991 .

[8]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[9]  Dolores Cañamero,et al.  Modeling motivations and emotions as a basis for intelligent behavior , 1997, AGENTS '97.

[10]  P. Redgrave,et al.  The Basal Ganglia viewed as an Action Selection Device , 1998 .

[11]  E. Rolls,et al.  Value, Pleasure and Choice in the Ventral Prefrontal Cortex , 2022 .

[12]  D J McFarland,et al.  The behavioural final common path. , 1975, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[13]  E. Rolls,et al.  The orbitofrontal cortex and beyond: From affect to decision-making , 2008, Progress in Neurobiology.

[14]  Andreas B. Eder,et al.  When do motor behaviors (mis)match affective stimuli? An evaluative coding view of approach and avoidance reactions. , 2008, Journal of experimental psychology. General.

[15]  Xin Jin,et al.  Start/stop signals emerge in nigrostriatal circuits during sequence learning , 2010, Nature.

[16]  Franz Halberg,et al.  Cycles of Nature: An Introduction to Biological Rhythms , 1990 .

[17]  Na Na An Outline of Psychoanalysis , 1949, Mental Health.

[18]  P. Shizgal Neural basis of utility estimation , 1997, Current Opinion in Neurobiology.

[19]  Timothy E. J. Behrens,et al.  Double dissociation of value computations in orbitofrontal and anterior cingulate neurons , 2011, Nature Neuroscience.

[20]  E. Rolls The functions of the orbitofrontal cortex , 1999, Brain and Cognition.

[21]  Gianluca Baldassarre,et al.  A modular neural-network model of the basal ganglia’s role in learning and selecting motor behaviours , 2002, Cognitive Systems Research.

[22]  J. O'Doherty,et al.  Evidence for a Common Representation of Decision Values for Dissimilar Goods in Human Ventromedial Prefrontal Cortex , 2009, The Journal of Neuroscience.

[23]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[24]  Samuel M. McClure,et al.  A computational substrate for incentive salience , 2003, Trends in Neurosciences.

[25]  E. Procyk,et al.  Behavioral Shifts and Action Valuation in the Anterior Cingulate Cortex , 2008, Neuron.

[26]  Karl Christoph Klauer,et al.  Does ignoring lead to worse evaluations? A new explanation of the stimulus devaluation effect , 2012, Cognition & emotion.

[27]  Xiao Huang,et al.  Novelty and Reinforcement Learning in the Value System of Developmental Robots , 2002 .

[28]  David H. Ackley,et al.  Interactions between learning and evolution , 1991 .

[29]  John Hallam,et al.  From Animals to Animats 10 , 2008 .

[30]  A. Damasio The Feeling of What Happens: Body and Emotion in the Making of Consciousness , 1999 .

[31]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[32]  Theodoros Damoulas,et al.  Valency for Adaptive Homeostatic Agents: Relating Evolution and Learning , 2005, ECAL.

[33]  B. Balleine,et al.  Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.

[34]  Joel L. Davis,et al.  Adaptive Critics and the Basal Ganglia , 1995 .

[35]  Andrew G. Barto,et al.  An Adaptive Robot Motivational System , 2006, SAB.

[36]  E. Rolls,et al.  From affective value to decision‐making in the prefrontal cortex , 2008, The European journal of neuroscience.

[37]  Scott Kuindersma,et al.  Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories , 2010, NIPS.

[38]  Anil K. Seth,et al.  Agent-based modelling and the environmental complexity thesis , 2002 .

[39]  Martin V. Butz,et al.  Self-Organizing Sensorimotor Maps Plus Internal Motivations Yield Animal-Like Behavior , 2010, Adapt. Behav..

[40]  C. Büchel,et al.  Neural representations of subjective reward value , 2010, Behavioural Brain Research.

[41]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[42]  Francis Bloch Comment on "Making Decisions in Large Worlds' by Ken Binmore , 2007 .

[43]  Lola Cañamero,et al.  Learning Affordances of Consummatory Behaviors: Motivation-Driven Adaptive Perception , 2010, Adapt. Behav..

[44]  P. Shizgal,et al.  Effects of sodium depletion on competition and summation between rewarding effects of salt and lateral hypothalamic stimulation in the rat. , 1994, Behavioral neuroscience.

[45]  P. Shizgal,et al.  Differential effects of postingestive feedback on the reward value of sucrose and lateral hypothalamic stimulation in rats. , 1994, Behavioral neuroscience.

[46]  A. Guillot,et al.  Adaptive motivation in a biomimetic action selection mechanism , 2008 .

[47]  Emmet Spier,et al.  Basic cycles, utility and opportunism in self-sufficient robots , 1997, Robotics Auton. Syst..

[48]  H. Evans The Study of Instinct , 1952 .

[49]  J. Hollerman,et al.  Reward processing in primate orbitofrontal cortex and basal ganglia. , 2000, Cerebral cortex.

[50]  J. Tanji,et al.  Behavioral planning in the prefrontal cortex , 2001, Current Opinion in Neurobiology.

[51]  E. Miller,et al.  Neuronal activity in primate dorsolateral and orbital prefrontal cortex during performance of a reward preference task , 2003, The European journal of neuroscience.

[52]  William Rowan,et al.  The Study of Instinct , 1953 .

[53]  P. Glimcher,et al.  The neural correlates of subjective value during intertemporal choice , 2007, Nature Neuroscience.

[54]  P. Redgrave,et al.  The basal ganglia: a vertebrate solution to the selection problem? , 1999, Neuroscience.

[55]  George Dimitri Konidaris,et al.  An Architecture for Behavior-Based Reinforcement Learning , 2005, Adapt. Behav..

[56]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[57]  Thomas E. Hazy,et al.  Towards an executive without a homunculus: computational models of the prefrontal cortex/basal ganglia system , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[58]  Konrad Z. Lorenz,et al.  Evolution and Modification of Behaviour , 1965 .

[59]  M. Desmurget,et al.  Basal ganglia contributions to motor control: a vigorous tutor , 2010, Current Opinion in Neurobiology.

[60]  W. Davis The Ecological Approach to Visual Perception , 2012 .

[61]  M. Rushworth,et al.  General Mechanisms for Making Decisions? This Review Comes from a Themed Issue on Cognitive Neuroscience Edited the Representation of Value and Reward Expectations in Frontal Cortex Reward Prediction Errors and Learning Rates Other Types of Prediction Error , 2022 .

[62]  G. E. Alexander,et al.  Basal ganglia-thalamocortical circuits: parallel substrates for motor, oculomotor, "prefrontal" and "limbic" functions. , 1990, Progress in brain research.

[63]  Rolf Pfeifer,et al.  Building “Fungus Eaters”: Design Principles of Autonomous Agents , 2007 .

[64]  Jeremy R. Reynolds,et al.  Developing PFC representations using reinforcement learning , 2009, Cognition.

[65]  R. Hinde Energy models of motivation. , 1960, Symposia of the Society for Experimental Biology.

[66]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[67]  D. M. Hutton,et al.  Cambrian Intelligence: The Early History of the New AI , 2000 .

[68]  Mehdi Khamassi,et al.  Actor–Critic Models of Reinforcement Learning in the Basal Ganglia: From Natural to Artificial Rats , 2005, Adapt. Behav..

[69]  J. Velásquez Modeling Emotion-Based Decision-Making , 1998 .

[70]  Nicolas Tabareau,et al.  Where neuroscience and dynamic system theory meet autonomous robotics: A contracting basal ganglia model for action selection , 2008, Neural Networks.

[71]  J. Wallis Cross-species studies of orbitofrontal cortex and value-based decision-making , 2011, Nature Neuroscience.

[72]  Rodney A. Brooks,et al.  Learning a Distributed Map Representation Based on Navigation Behaviors , 1999 .

[73]  Stewart W. Wilson The animat path to AI , 1991 .

[74]  Charles E. Taylor,et al.  Artificial Life II , 1991 .

[75]  Kathryn E. Merrick,et al.  Modeling Behavior Cycles as a Value System for Developmental Robots , 2010, Adapt. Behav..

[76]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[77]  B. Everitt,et al.  Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex , 2002, Neuroscience & Biobehavioral Reviews.

[78]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[79]  N. Daw,et al.  Signals in Human Striatum Are Appropriate for Policy Update Rather than Value Prediction , 2011, The Journal of Neuroscience.

[80]  Colin Camerer,et al.  Dissociating the Role of the Orbitofrontal Cortex and the Striatum in the Computation of Goal Values and Prediction Errors , 2008, The Journal of Neuroscience.

[81]  W. Ashby,et al.  Design for a brain; the origin of adaptive behavior , 2011 .