Modeling effects of intrinsic and extrinsic rewards on the competition between striatal learning systems

A common assumption in psychology, economics, and other fields holds that higher performance will result if extrinsic rewards (such as money) are offered as an incentive. While this principle seems to work well for tasks that require the execution of the same sequence of steps over and over, with little uncertainty about the process, in other cases, especially where creative problem solving is required due to the difficulty in finding the optimal sequence of actions, external rewards can actually be detrimental to task performance. Furthermore, they have the potential to undermine intrinsic motivation to do an otherwise interesting activity. In this work, we extend a computational model of the dorsomedial and dorsolateral striatal reinforcement learning systems to account for the effects of extrinsic and intrinsic rewards. The model assumes that the brain employs both a goal-directed and a habitual learning system, and competition between both is based on the trade-off between the cost of the reasoning process and value of information. The goal-directed system elicits internal rewards when its models of the environment improve, while the habitual system, being model-free, does not. Our results account for the phenomena that initial extrinsic reward leads to reduced activity after extinction compared to the case without any initial extrinsic rewards, and that performance in complex task settings drops when higher external rewards are promised. We also test the hypothesis that external rewards bias the competition in favor of the computationally efficient, but cruder and less flexible habitual system, which can negatively influence intrinsic motivation and task performance in the class of tasks we consider.

[1]  Richard Southwell The Imperial College , 1949 .

[2]  E. Deci Effects of Externally Mediated Rewards on Intrinsic Motivation. , 1971 .

[3]  Kenneth O. McGraw,et al.  Evidence of a detrimental effect of extrinsic incentives on breaking a mental set. , 1979 .

[4]  Edward L. Deci,et al.  Intrinsic Motivation and Self-Determination in Human Behavior , 1975, Perspectives in Social Psychology.

[5]  K. Miller,et al.  Intrinsic Motivation and Self-Determination in Human Behavior , 1975, Perspectives in Social Psychology.

[6]  Daniel Gopher,et al.  Effects of goal difficulty, self-set goals, and monetary rewards on dual task performance , 1990 .

[7]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[8]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[9]  Alan F. Murray,et al.  International Joint Conference on Neural Networks , 1993 .

[10]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[11]  E. Deci,et al.  A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. , 1999, Psychological bulletin.

[12]  E. Deci,et al.  Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. , 2000, The American psychologist.

[13]  D. Kahneman,et al.  Representativeness revisited: Attribute substitution in intuitive judgment. , 2002 .

[14]  S. Killcross,et al.  Coordination of actions and habits in the medial prefrontal cortex of rats. , 2003, Cerebral cortex.

[15]  P. Holland Relations between Pavlovian-instrumental transfer and reinforcer devaluation. , 2004, Journal of experimental psychology. Animal behavior processes.

[16]  John N. Tsitsiklis,et al.  Bias and variance in value function estimation , 2004, ICML.

[17]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[18]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  B. Balleine,et al.  The role of the dorsomedial striatum in instrumental conditioning , 2005, The European journal of neuroscience.

[21]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[22]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[23]  G. Baldassarre,et al.  Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[24]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[25]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[26]  Jürgen Schmidhuber,et al.  Driven by Compression Progress , 2008, KES.

[27]  Nina Mazar,et al.  Large stakes and big mistakes , 2009 .

[28]  Pierre-Yves Oudeyer,et al.  R-IAC: Robust Intrinsically Motivated Exploration and Active Learning , 2009, IEEE Transactions on Autonomous Mental Development.

[29]  Pierre-Yves Oudeyer,et al.  Robust intrinsically motivated exploration and active learning , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[30]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[31]  Minoru Asada,et al.  Between Frustration and Elation: Sense of Control Regulates the lntrinsic Motivation for Motor Learning , 2011, Lifelong Learning.

[32]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[33]  K Caluwaerts,et al.  A biologically inspired meta-control navigation system for the Psikharpax rat robot , 2012, Bioinspiration & biomimetics.

[34]  Marco Mirolli,et al.  Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study , 2013, Neural Networks.

[35]  Marco Mirolli,et al.  Intrinsically Motivated Learning in Natural and Artificial Systems , 2013 .

[36]  Kevin N. Gurney,et al.  A biologically plausible embodied model of action discovery , 2012, Front. Neurorobot..

[37]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.