Controllability governs the balance between Pavlovian and instrumental action selection

A Pavlovian bias to approach reward-predictive cues and avoid punishment-predictive cues can conflict with instrumentally-optimal actions. While most previous work has assumed that this bias is a fixed trait, we argue that it can vary within an individual. In particular, we propose that the brain arbitrates between Pavlovian and instrumental control by inferring which is a better predictor of reward. The instrumental predictor is more flexible; it can learn values that depend on both stimuli and actions, whereas the Pavlovian predictor learns values that depend only on stimuli. The arbitration theory predicts that the Pavlovian predictor will be favored when rewards are relatively uncontrollable, because the additional flexibility of the instrumental predictor is not useful. Consistent with this hypothesis, we find that the Pavlovian approach bias is stronger under low control compared to high control contexts.

[1]  Aaron C. Courville,et al.  Bayesian theories of conditioning in a changing world , 2006, Trends in Cognitive Sciences.

[2]  Joshua de Leeuw,et al.  jsPsych: A JavaScript library for creating behavioral experiments in a Web browser , 2014, Behavior Research Methods.

[3]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[4]  K. Breland,et al.  The misbehavior of organisms. , 1961 .

[5]  Momchil S. Tomov,et al.  Neural Computations Underlying Causal Structure Learning , 2017, The Journal of Neuroscience.

[6]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[7]  M. Seligman,et al.  Learned helplessness: Theory and evidence. , 1976 .

[8]  Noah D. Goodman,et al.  Learned helplessness and generalization , 2013, CogSci.

[9]  M. Frank,et al.  University of Birmingham Catecholaminergic challenge uncovers distinct Pavlovian and instrumental mechanisms of motivated (in)action , 2017 .

[10]  Ian W. Eisenberg,et al.  Frontal Theta Overrides Pavlovian Learning Biases , 2013, The Journal of Neuroscience.

[11]  Gerd Gigerenzer,et al.  Homo Heuristicus: Why Biased Minds Make Better Inferences , 2009, Top. Cogn. Sci..

[12]  D. Blei,et al.  Context, learning, and extinction. , 2010, Psychological review.

[13]  S. Mineka,et al.  Controllability and predictability in acquired motivation. , 1985, Annual review of psychology.

[14]  C. Hartley,et al.  Agency and the Calibration of Motivated Behavior , 2017, Trends in Cognitive Sciences.

[15]  O. Mowrer On the dual nature of learning—a re-interpretation of "conditioning" and "problem-solving." , 1947 .

[16]  D. R. Williams,et al.  Auto-maintenance in the pigeon: sustained pecking despite contingent non-reinforcement. , 1969, Journal of the experimental analysis of behavior.

[17]  P. Dayan,et al.  A Bayesian formulation of behavioral control , 2009, Cognition.

[18]  W. Hershberger An approach through the looking-glass , 1986 .

[19]  E A Wasserman,et al.  Pavlovian appetitive contingencies and approach versus withdrawal to conditioned stimuli in pigeons. , 1974, Journal of comparative and physiological psychology.

[20]  P. Dayan,et al.  Action controls dopaminergic enhancement of reward representations , 2012, Proceedings of the National Academy of Sciences.

[21]  M. Subrahmanyam Theory and Evidence , 2013 .

[22]  P. Dayan,et al.  Action versus valence in decision making , 2014, Trends in Cognitive Sciences.

[23]  L. Abramson,et al.  Learned helplessness, depression, and the illusion of control. , 1982, Journal of personality and social psychology.

[24]  S. Gershman Deconstructing the human algorithms for exploration , 2018, Cognition.

[25]  Alexandre Lacoste,et al.  PAC-Bayesian Theory Meets Bayesian Inference , 2016, NIPS.

[26]  Vijay Balasubramanian,et al.  A bias–variance trade-off governs individual differences in on-line learning in an unpredictable environment , 2018, Nature Human Behaviour.

[27]  Peter Dayan,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[28]  J. Konorski,et al.  On a particular form of conditioned reflex. , 1969, Journal of the experimental analysis of behavior.

[29]  Raymond J. Dolan,et al.  Go and no-go learning in reward and punishment: Interactions between affect and effect , 2012, NeuroImage.

[30]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[31]  R. Rescorla,et al.  Two-process learning theory: Relationships between Pavlovian conditioning and instrumental learning. , 1967, Psychological review.

[32]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.