The Computational Development of Reinforcement Learning during Adolescence

Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents’ behaviour was better explained by a basic reinforcement learning algorithm, adults’ behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.

[1]  Timothy E. J. Behrens,et al.  Counterfactual Choice and Learning in a Neural Network Centered on Human Lateral Frontopolar Cortex , 2011, PLoS biology.

[2]  Susan L. Andersen,et al.  Transient D1 Dopamine Receptor Expression on Prefrontal Cortex Projection Neurons: Relationship to Enhanced Motivational Salience of Drug Cues in Adolescence , 2008, The Journal of Neuroscience.

[3]  J. O'Doherty,et al.  Model‐Based fMRI and Its Application to Reward Learning and Decision Making , 2007, Annals of the New York Academy of Sciences.

[4]  Colin Camerer,et al.  A framework for studying the neurobiology of value-based decision making , 2008, Nature Reviews Neuroscience.

[5]  B. Casey Beyond simple models of self-control to circuit-based accounts of adolescent behavior. , 2015, Annual review of psychology.

[6]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[7]  M. Sigman,et al.  Neuroscience and education: prime time to build the bridge , 2014, Nature Neuroscience.

[8]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[9]  P. Dayan Twenty-Five Lessons from Computational Neuromodulation , 2012, Neuron.

[10]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[11]  R. Dolan,et al.  Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[12]  Colin Camerer,et al.  Neuroeconomics: decision making and the brain , 2008 .

[13]  W. F. Prokasy,et al.  Classical conditioning II: Current research and theory. , 1972 .

[14]  Amir Homayoun Javadi,et al.  Adolescents Adapt More Slowly than Adults to Varying Reward Contingencies , 2014, Journal of Cognitive Neuroscience.

[15]  B. Eppinger,et al.  Developing developmental cognitive neuroscience: From agenda setting to hypothesis testing , 2015, Developmental Cognitive Neuroscience.

[16]  M. Pessiglione,et al.  Critical Roles for Anterior Insula and Dorsal Striatum in Punishment-Based Avoidance Learning , 2012, Neuron.

[17]  P. Dayan,et al.  Action versus valence in decision making , 2014, Trends in Cognitive Sciences.

[18]  Etienne Koechlin,et al.  An evolutionary computational theory of prefrontal executive function in decision-making , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[19]  Soyoung Q. Park,et al.  Decoding the Formation of Reward Predictions across Learning , 2011, The Journal of Neuroscience.

[20]  Gordon D. A. Brown,et al.  Does the brain calculate value? , 2011, Trends in Cognitive Sciences.

[21]  Michael X. Cohen,et al.  Striatum-medial prefrontal cortex connectivity predicts developmental changes in reinforcement learning. , 2012, Cerebral cortex.

[22]  Adriana Galván,et al.  Beyond simple models of adolescence to an integrated circuit-based account: A commentary , 2015, Developmental Cognitive Neuroscience.

[23]  Mathias Pessiglione,et al.  Hemispheric dissociation of reward processing in humans: Insights from deep brain stimulation , 2013, Cortex.

[24]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[25]  D. Hämmerer,et al.  Dopaminergic and prefrontal contributions to reward-based learning and outcome monitoring during child development and aging. , 2012, Developmental psychology.

[26]  M. Khamassi,et al.  Contextual modulation of value signals in reward and punishment learning , 2015, Nature Communications.

[27]  J. Fudge,et al.  A developmental neurobiological model of motivated behavior: Anatomy, connectivity and ontogeny of the triadic nodes , 2009, Neuroscience & Biobehavioral Reviews.

[28]  W. Newsome,et al.  The Trouble with Choice: Studying Decision Variables in the Brain , 2009 .

[29]  Markus Ullsperger,et al.  Real and Fictive Outcomes Are Processed Differently but Converge on a Common Adaptive Mechanism , 2013, Neuron.

[30]  N. Daw Advanced Reinforcement Learning , 2014 .

[31]  Colin Camerer,et al.  A Cognitive Hierarchy Model of Games , 2004 .

[32]  Colin Mathers,et al.  50-year mortality trends in children and young people: a study of 50 low-income, middle-income, and high-income countries , 2011, The Lancet.

[33]  Colin Camerer,et al.  Neural Response to Reward Anticipation under Risk Is Nonlinear in Probabilities , 2009, The Journal of Neuroscience.

[34]  T. Robbins,et al.  Decision-making in the adolescent brain , 2012, Nature Neuroscience.

[35]  L. Steinberg Cognitive and affective development in adolescence , 2005, Trends in Cognitive Sciences.

[36]  E. Phelps,et al.  Social learning of fear , 2007, Nature Neuroscience.

[37]  E. Crone,et al.  Distinct linear and non-linear trajectories of reward and punishment reversal learning during development: Relevance for dopamine's role in adolescent decision making , 2011, Developmental Cognitive Neuroscience.

[38]  Thomas F. Nugent,et al.  Dynamic mapping of human cortical development during childhood through early adulthood. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Paul J. C. Adachi,et al.  Examining the link between adolescent brain development and risk taking from a social–developmental perspective , 2013, Brain and Cognition.

[40]  H. Sercombe,et al.  Risk, adaptation and the functional teenage brain , 2014, Brain and Cognition.

[41]  Mark A. Straccia,et al.  Anterior Cingulate Engagement in a Foraging Context Reflects Choice Difficulty, Not Foraging Value , 2014, Nature Neuroscience.

[42]  S. Rombouts,et al.  Better than Expected or as Bad as You Thought? The Neurocognitive Development of Probabilistic Feedback Processing , 2009, Front. Hum. Neurosci..

[43]  Sang Wan Lee,et al.  The structure of reinforcement-learning mechanisms in the human brain , 2015, Current Opinion in Behavioral Sciences.

[44]  N. Daw,et al.  Signals in Human Striatum Are Appropriate for Policy Update Rather than Value Prediction , 2011, The Journal of Neuroscience.

[45]  Michael J. Brammer,et al.  Neural and Psychological Maturation of Decision-making in Adolescence and Young Adulthood , 2013, Journal of Cognitive Neuroscience.

[46]  Ashley R. Smith,et al.  The dual systems model: Review, reappraisal, and reaffirmation , 2015, Developmental Cognitive Neuroscience.

[47]  Todd A. Hare,et al.  Frontostriatal Maturation Predicts Cognitive Control Failure to Appetitive Cues in Adolescents , 2011, Journal of Cognitive Neuroscience.

[48]  P. Glimcher,et al.  Annals of the New York Academy of Sciences Efficient Coding and the Neural Representation of Value , 2022 .

[49]  Raymond J. Dolan,et al.  Go and no-go learning in reward and punishment: Interactions between affect and effect , 2012, NeuroImage.

[50]  T. Hare,et al.  Changes in cerebral functional organization during cognitive development , 2005, Current Opinion in Neurobiology.

[51]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[52]  M. Pessiglione,et al.  Dopamine-dependent reinforcement of motor skill learning: evidence from Gilles de la Tourette syndrome. , 2011, Brain : a journal of neurology.

[53]  T. Maia Two-factor theory, the actor-critic model, and conditioned avoidance , 2010, Learning & behavior.

[54]  E. Weber,et al.  Affective and deliberative processes in risky choice: age differences in risk taking in the Columbia Card Task. , 2009, Journal of experimental psychology. Learning, memory, and cognition.

[55]  Russell M. Viner,et al.  Adolescent Health 2 Adolescence and the social determinants of health , 2012 .

[56]  Sarah-Jayne Blakemore,et al.  Is adolescence a sensitive period for sociocultural processing? , 2014, Annual review of psychology.

[57]  Viktor Müller,et al.  Life Span Differences in Electrophysiological Correlates of Monitoring Gains and Losses during Probabilistic Reinforcement Learning , 2011, Journal of Cognitive Neuroscience.

[58]  Lionel Rigoux,et al.  VBA: A Probabilistic Treatment of Nonlinear Models for Neurobiological and Behavioural Data , 2014, PLoS Comput. Biol..

[59]  M. Frank,et al.  From reinforcement learning models to psychiatric and neurological disorders , 2011, Nature Neuroscience.

[60]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[61]  D. Shohamy,et al.  Preference by Association: How Memory Mechanisms in the Hippocampus Bias Decisions , 2012, Science.

[62]  Michael Marmot,et al.  Adolescence and the social determinants of health , 2012, The Lancet.

[63]  Karl J. Friston,et al.  Computational psychiatry , 2012, Trends in Cognitive Sciences.

[64]  N. Daw,et al.  The ubiquity of model-based reinforcement learning , 2012, Current Opinion in Neurobiology.

[65]  Mathias Pessiglione,et al.  Opponent Brain Systems for Reward and Punishment Learning: Causal Evidence From Drug and Lesion Studies in Humans , 2017 .

[66]  B. Seymour,et al.  When is a loss a loss? Excitatory and inhibitory processes in loss-related decision-making , 2015, Current Opinion in Behavioral Sciences.

[67]  Adrian G. Fischer,et al.  Neural mechanisms and temporal dynamics of performance monitoring , 2014, Trends in Cognitive Sciences.

[68]  T. Paus,et al.  Why do many psychiatric disorders emerge during adolescence? , 2008, Nature Reviews Neuroscience.

[69]  M. Pessiglione,et al.  Pharmacological modulation of subliminal learning in Parkinson's and Tourette's syndromes , 2009, Proceedings of the National Academy of Sciences.

[70]  Thomas H. B. FitzGerald,et al.  Disruption of Dorsolateral Prefrontal Cortex Decreases Model-Based in Favor of Model-free Control in Humans , 2013, Neuron.

[71]  Alan C. Evans,et al.  Brain development during childhood and adolescence: a longitudinal MRI study , 1999, Nature Neuroscience.

[72]  S. Blakemore,et al.  The Developmental Mismatch in Structural Brain Maturation during Adolescence , 2014, Developmental Neuroscience.

[73]  F. Benes,et al.  Convergence and plasticity of monoaminergic systems in the medial prefrontal cortex during the postnatal period: implications for the development of psychopathology. , 2000, Cerebral cortex.

[74]  Russell A. Poldrack,et al.  A unique adolescent response to reward prediction errors , 2010, Nature Neuroscience.

[75]  John O. Willis,et al.  Wechsler Abbreviated Scale of Intelligence , 2014 .

[76]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[77]  A. V. van Duijvenvoorde,et al.  Evaluating the Negative or Valuing the Positive? Neural Mechanisms Supporting Feedback-Based Learning across Development , 2008, The Journal of Neuroscience.

[78]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.