Separating Probability and Reversal Learning in a Novel Probabilistic Reversal Learning Task for Mice

The exploration/exploitation tradeoff – pursuing a known reward vs. sampling from lesser known options in the hope of finding a better payoff – is a fundamental aspect of learning and decision making. In humans, this has been studied using multi-armed bandit tasks. The same processes have also been studied using simplified probabilistic reversal learning (PRL) tasks with binary choices. Our investigations suggest that protocols previously used to explore PRL in mice may prove beyond their cognitive capacities, with animals performing at a no-better-than-chance level. We sought a novel probabilistic learning task to improve behavioral responding in mice, whilst allowing the investigation of the exploration/exploitation tradeoff in decision making. To achieve this, we developed a two-lever operant chamber task with levers corresponding to different probabilities (high/low) of receiving a saccharin reward, reversing the reward contingencies associated with levers once animals reached a threshold of 80% responding at the high rewarding lever. We found that, unlike in existing PRL tasks, mice are able to learn and behave near optimally with 80% high/20% low reward probabilities. Altering the reward contingencies towards equality showed that some mice displayed preference for the high rewarding lever with probabilities as close as 60% high/40% low. Additionally, we show that animal choice behavior can be effectively modelled using reinforcement learning (RL) models incorporating learning rates for positive and negative prediction error, a perseveration parameter, and a noise parameter. This new decision task, coupled with RL analyses, advances access to investigate the neuroscience of the exploration/exploitation tradeoff in decision making.

[1]  K. Lesch,et al.  Establishing a probabilistic reversal learning test in mice: Evidence for the processes mediating reward-stay and punishment-shift behaviour and for their modulation by serotonin , 2012, Neuropharmacology.

[2]  Z. F. H. Cao,et al.  Taste uncoupled from nutrition fails to sustain the reinforcing properties of food , 2012, The European journal of neuroscience.

[3]  M. Geyer,et al.  Isolation rearing effects on probabilistic learning and cognitive flexibility in rats , 2013, Cognitive, Affective, & Behavioral Neuroscience.

[4]  Vincent D Costa,et al.  Reversal Learning and Dopamine: A Bayesian Perspective , 2015, The Journal of Neuroscience.

[5]  L. Wilkinson,et al.  Measuring impulsivity in mice using a novel operant delayed reinforcement task: effects of behavioural manipulations and d-amphetamine , 2003, Psychopharmacology.

[6]  H. Akaike A new look at the statistical model identification , 1974 .

[7]  S. Floresco,et al.  Multifaceted Contributions by Different Regions of the Orbitofrontal and Medial Prefrontal Cortex to Probabilistic Reversal Learning , 2016, The Journal of Neuroscience.

[8]  J. Enkhuizen,et al.  The effects of reduced dopamine transporter function and chronic lithium on motivation, probabilistic learning, and neurochemistry in mice: Modeling bipolar mania , 2017, Neuropharmacology.

[9]  T. Robbins,et al.  Improved short-term spatial memory but impaired reversal learning following the dopamine D2 agonist bromocriptine in human volunteers , 2001, Psychopharmacology.

[10]  T. Robbins,et al.  Serotonin Modulates Sensitivity to Reward and Negative Feedback in a Probabilistic Reversal Learning Task in Rats , 2010, Neuropsychopharmacology.

[11]  Romain D. Cazé,et al.  Adaptive properties of differential learning rates for positive and negative outcomes , 2013, Biological Cybernetics.

[12]  R. Saunders,et al.  Prefrontal mechanisms of behavioral flexibility, emotion regulation and value updating , 2013, Nature Neuroscience.

[13]  T. Robbins,et al.  Discrimination, reversal, and shift learning in Huntington’s disease: mechanisms of impaired response selection , 1999, Neuropsychologia.

[14]  Antonio Gasparrini,et al.  Distributed Lag Linear and Non-Linear Models in R: The Package dlnm. , 2011, Journal of statistical software.

[15]  M. Gluck,et al.  Dopaminergic Drugs Modulate Learning Rates and Perseveration in Parkinson's Patients in a Dynamic Foraging Task , 2009, The Journal of Neuroscience.

[16]  Michael X. Cohen,et al.  Neurocomputational mechanisms of reinforcement-guided learning in humans: A review , 2008, Cognitive, affective & behavioral neuroscience.

[17]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[18]  Michael J. Frank,et al.  Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning , 2007, Proceedings of the National Academy of Sciences.

[19]  David R. Anderson,et al.  Avoiding pitfalls when using information-theoretic methods , 2002 .

[20]  John C. Nash,et al.  Unifying Optimization Algorithms to Aid Software System Users: optimx for R , 2011 .

[21]  J. Sweeney,et al.  Risperidone and the 5‐HT2A Receptor Antagonist M100907 Improve Probabilistic Reversal Learning in BTBR T + tf/J Mice , 2014, Autism research : official journal of the International Society for Autism Research.

[22]  Sham M. Kakade,et al.  Opponent interactions between serotonin and dopamine , 2002, Neural Networks.

[23]  S. Floresco,et al.  Preferential Involvement by Nucleus Accumbens Shell in Mediating Probabilistic Learning and Reversal Shifts , 2014, The Journal of Neuroscience.