Few-shot learning: temporal scaling in behavioral and dopaminergic learning

How do we learn associations in the world (e.g., between cues and rewards)? Cue-reward associative learning is controlled in the brain by mesolimbic dopamine1–4. It is widely believed that dopamine drives such learning by conveying a reward prediction error (RPE) in accordance with temporal difference reinforcement learning (TDRL) algorithms5. TDRL implementations are “trial-based”: learning progresses sequentially across individual cue-outcome experiences. Accordingly, a foundational assumption—often considered a mere truism—is that the more cuereward pairings one experiences, the more one learns this association. Here, we disprove this assumption, thereby falsifying a foundational principle of trial-based learning algorithms. Specifically, when a group of head-fixed mice received ten times fewer experiences over the same total time as another, a single experience produced as much learning as ten experiences in the other group. This quantitative scaling also holds for mesolimbic dopaminergic learning, with the increase in learning rate being so high that the group with fewer experiences exhibits dopaminergic learning in as few as four cue-reward experiences and behavioral learning in nine. An algorithm implementing reward-triggered retrospective learning explains these findings. The temporal scaling and few-shot learning observed here fundamentally changes our understanding of the neural algorithms of associative learning.

[1]  Stefan Mihalas,et al.  Mesolimbic dopamine release conveys causal associations , 2022, Science.

[2]  Koji Toda,et al.  Pupillary dynamics of mice performing a Pavlovian delay conditioning task reflect reward-predictive signals , 2022, bioRxiv.

[3]  M. Andermann,et al.  Cortical reactivations predict future sensory responses , 2022, bioRxiv.

[4]  P. Shizgal,et al.  Does phasic dopamine release cause policy updates? , 2022, bioRxiv.

[5]  C. Gallistel,et al.  Dopamine encodes real-time reward availability and transitions between reward availability states on different timescales , 2022, Nature Communications.

[6]  K. Kuchibhotla,et al.  Slow or sudden: Re-interpreting the learning curve for modern systems neuroscience , 2022, IBRO neuroscience reports.

[7]  V. M. K. Namboodiri How do real animals account for the passage of time during associative learning? , 2022, Behavioral neuroscience.

[8]  M. Andermann,et al.  History-dependent dopamine release increases cAMP levels in most basal amygdala glutamatergic neurons to control learning. , 2022, Cell reports.

[9]  Marc W. Howard,et al.  Predicting the Future With a Scale-Invariant Temporal Memory for the Past , 2021, Neural Computation.

[10]  T. Stalder,et al.  Pupil dilation as an index of Pavlovian conditioning. A systematic review and meta-analysis , 2021, Neuroscience & Biobehavioral Reviews.

[11]  Cody A. Siciliano,et al.  Dopamine release in the nucleus accumbens core signals perceived saliency , 2021, Current Biology.

[12]  P. Perona,et al.  Mice in a labyrinth show rapid learning, sudden insight, and efficient exploration , 2021, eLife.

[13]  Arif A. Hamid,et al.  Wave-like dopamine dynamics as a mechanism for spatiotemporal credit assignment , 2021, Cell.

[14]  Demetris K. Roumis,et al.  Hippocampal replay reflects specific past experiences rather than a plan for subsequent choice , 2021, Neuron.

[15]  Bryan C. Souza,et al.  Learning differentially shapes prefrontal and hippocampal activity during classical conditioning , 2020, bioRxiv.

[16]  L. Hunninck,et al.  A Field-Based Adaptation of the Classic Morris Water Maze to Assess Learning and Memory in a Free-Living Animal , 2020, Animal Behavior and Cognition.

[17]  Chris A. B. Zajchowski,et al.  Learning, Fast and Slow , 2020, SCHOLE: A Journal of Leisure Studies and Recreation Education.

[18]  P. Shizgal,et al.  Dopamine neurons do not constitute an obligatory stage in the final common path for the evaluation and pursuit of brain stimulation reward , 2020, PloS one.

[19]  Jeffrey D. Zaremba,et al.  Cortical reactivations of recent sensory experiences predict bidirectional network changes during learning , 2020, Nature Neuroscience.

[20]  M. Bouton,et al.  Effects of conditioned stimulus (CS) duration, intertrial interval, and I/T ratio on appetitive Pavlovian conditioning. , 2020, Journal of experimental psychology. Animal learning and cognition.

[21]  Hannah M. Batchelor,et al.  Dopamine transients do not act as model-free prediction errors during associative learning , 2020, Nature Communications.

[22]  Konstantin I Bakhurin,et al.  Temporally restricted dopaminergic control of reward-conditioned movements , 2019, Nature Neuroscience.

[23]  Samuel J. Gershman,et al.  A Unified Framework for Dopamine Signals across Timescales , 2019, Cell.

[24]  Geoffrey Schoenbaum,et al.  Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors , 2019, Nature Neuroscience.

[25]  C. Gallistel,et al.  Number and time in acquisition, extinction and recovery. , 2019, Journal of the experimental analysis of behavior.

[26]  J. Horvitz,et al.  NMDA receptor-dependent plasticity in the nucleus accumbens connects reward-predictive cues to approach responses , 2019, Nature Communications.

[27]  C. Gallistel,et al.  Contingency, contiguity, and causality in conditioning: Applying information theory and Weber's Law to the assignment of credit problem. , 2019, Psychological review.

[28]  A. Heinz,et al.  Pupil dilation as an implicit measure of appetitive Pavlovian learning. , 2019, Psychophysiology.

[29]  Arif A. Hamid,et al.  Dissociable dopamine dynamics for learning and motivation. , 2019, Nature.

[30]  James M. Otis,et al.  Single-cell activity tracking reveals that orbitofrontal neurons acquire and maintain a long-term memory to guide behavioral adaptation , 2019, Nature Neuroscience.

[31]  Supplemental Material for Contingency, Contiguity, and Causality in Conditioning: Applying Information Theory and Weber’s Law to the Assignment of Credit Problem , 2019, Psychological Review.

[32]  Raphael Vallat,et al.  Pingouin: statistics in Python , 2018, J. Open Source Softw..

[33]  Luke T. Coddington,et al.  The timing of action determines reward prediction signals in identified midbrain dopamine neurons , 2018, Nature Neuroscience.

[34]  Krzysztof J. Gorgolewski,et al.  Reward Learning over Weeks Versus Minutes Increases the Neural Representation of Value in the Human Brain , 2018, The Journal of Neuroscience.

[35]  Richard S. Sutton,et al.  Associative Learning from Replayed Experience , 2017, bioRxiv.

[36]  N. Uchida,et al.  Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice , 2016, eLife.

[37]  K. Wassum,et al.  Nucleus accumbens core dopamine signaling tracks the need‐based motivational value of food‐paired cues , 2016, Journal of neurochemistry.

[38]  Paul Smolen,et al.  The right time to learn: mechanisms and optimization of spaced learning , 2016, Nature Reviews Neuroscience.

[39]  Vaughn L. Hetrick,et al.  Mesolimbic Dopamine Signals the Value of Work , 2015, Nature Neuroscience.

[40]  Talia N. Lerner,et al.  Intact-Brain Analyses Reveal Distinct Information Carried by SNc Dopamine Subcircuits , 2015, Cell.

[41]  P. Glimcher,et al.  Phasic Dopamine Release in the Rat Nucleus Accumbens Symmetrically Encodes a Reward Prediction Error Term , 2014, The Journal of Neuroscience.

[42]  Hermann Ebbinghaus (1885) Memory: A Contribution to Experimental Psychology , 2013, Annals of Neurosciences.

[43]  Josiah R. Boivin,et al.  A Causal Link Between Prediction Errors, Dopamine Neurons and Learning , 2013, Nature Neuroscience.

[44]  P. Phillips,et al.  Dopamine Encoding of Pavlovian Incentive Stimuli Diminishes with Extended Training , 2013, The Journal of Neuroscience.

[45]  Marc W. Howard,et al.  A Scale-Invariant Internal Representation of Time , 2012, Neural Computation.

[46]  Anne E Carpenter,et al.  Neuron-type specific signals for reward and punishment in the ventral tegmental area , 2011, Nature.

[47]  Ryan D Ward,et al.  CS Informativeness Governs CS-US Associability , 2012 .

[48]  K. Berridge Faculty Opinions recommendation of A selective role for dopamine in stimulus-reward learning. , 2011 .

[49]  Margaret F. Carr,et al.  Hippocampal replay in the awake state: a potential physiological substrate of memory consolidation and retrieval , 2011 .

[50]  Matthijs A. A. van der Meer,et al.  Hippocampal Replay Is Not a Simple Function of Experience , 2010, Neuron.

[51]  Richard S. Sutton,et al.  Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.

[52]  M. Bouton,et al.  Analysis of a trial-spacing effect with relatively long intertrial intervals , 2008, Learning & behavior.

[53]  Daniel A. Gottlieb Is the number of trials a primary determinant of conditioned responding? , 2008, Journal of experimental psychology. Animal behavior processes.

[54]  G. Buzsáki,et al.  Forward and reverse hippocampal place-cell sequences during ripples , 2007, Nature Neuroscience.

[55]  R. Wightman,et al.  Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens , 2007, Nature Neuroscience.

[56]  David J. Foster,et al.  Reverse replay of behavioural sequences in hippocampal place cells during the awake state , 2006, Nature.

[57]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[58]  C. Gallistel,et al.  The learning curve: implications of a quantitative analysis. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[59]  M. Bouton,et al.  Memory priming and trial spacing effects in Pavlovian learning , 2004, Learning & behavior.

[60]  R. Wightman,et al.  Subsecond dopamine release promotes cocaine seeking , 2003, Nature.

[61]  W. Schultz,et al.  Dopamine responses comply with basic assumptions of formal learning theory , 2001, Nature.

[62]  R. Menzel,et al.  Massed and spaced learning in honeybees: the role of CS, US, the intertrial interval, and the test interval. , 2001, Learning & memory.

[63]  Charles H. Shea,et al.  Spacing practice sessions across days benefits the learning of motor skills , 2000 .

[64]  P. Holland Trial and intertrial durations in appetitive conditioning in rats , 2000 .

[65]  C. D. Beck,et al.  Learning Performance of Normal and MutantDrosophila after Repeated Conditioning Trials with Discrete Stimuli , 2000, The Journal of Neuroscience.

[66]  C. Gallistel,et al.  Time, rate, and conditioning. , 2000, Psychological review.

[67]  K. Lattal,et al.  Trial and intertrial durations in Pavlovian conditioning: issues of learning and performance. , 1999, Journal of experimental psychology. Animal behavior processes.

[68]  T. Carew,et al.  Differential induction of long-term synaptic facilitation by spaced and massed applications of serotonin at sensory neuron synapses of Aplysia californica. , 1998, Learning & memory.

[69]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[70]  M. Fanselow Neural organization of the defensive behavior system responsible for fear , 1994, Psychonomic bulletin & review.

[71]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[72]  M. Fanselow,et al.  Contextual conditioning with massed versus distributed unconditional stimuli in the absence of explicit conditional stimuli. , 1988, Journal of experimental psychology. Animal behavior processes.

[73]  J. Pearce,et al.  A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980, Psychological review.

[74]  N. Mackintosh Overshadowing and stimulus intensity , 1976, Animal learning & behavior.

[75]  J. Gibbon,et al.  Temporal factors influencing the acquisition and maintenance of an autoshaped keypeck , 1975 .

[76]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[77]  H. M. Bruce An Exteroceptive Block to Pregnancy in the Mouse , 1959, Nature.

[78]  B. Reynolds The acquisition of a trace conditioned response as a function of the magnitude of the stimulus trace. , 1945 .