Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes

Instrumental responses are hypothesized to be of two kinds: habitual and goal-directed, mediated by the sensorimotor and the associative cortico-basal ganglia circuits, respectively. The existence of the two heterogeneous associative learning mechanisms can be hypothesized to arise from the comparative advantages that they have at different stages of learning. In this paper, we assume that the goal-directed system is behaviourally flexible, but slow in choice selection. The habitual system, in contrast, is fast in responding, but inflexible in adapting its behavioural strategy to new conditions. Based on these assumptions and using the computational theory of reinforcement learning, we propose a normative model for arbitration between the two processes that makes an approximately optimal balance between search-time and accuracy in decision making. Behaviourally, the model can explain experimental evidence on behavioural sensitivity to outcome at the early stages of learning, but insensitivity at the later stages. It also explains that when two choices with equal incentive values are available concurrently, the behaviour remains outcome-sensitive, even after extensive training. Moreover, the model can explain choice reaction time variations during the course of learning, as well as the experimental observation that as the number of choices increases, the reaction time also increases. Neurobiologically, by assuming that phasic and tonic activities of midbrain dopamine neurons carry the reward prediction error and the average reward signals used by the model, respectively, the model predicts that whereas phasic dopamine indirectly affects behaviour through reinforcing stimulus-response associations, tonic dopamine can directly affect behaviour through manipulating the competition between the habitual and the goal-directed systems and thus, affect reaction time.

[1]  John N. Tsitsiklis,et al.  Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[2]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[3]  A. Barto,et al.  Effect on movement selection of an evolving sensory representation: A multiple controller model of skill acquisition , 2009, Brain Research.

[4]  A. Dickinson Actions and habits: the development of behavioural autonomy , 1985 .

[5]  Mitsuo Kawato,et al.  Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning , 2006, Neural Networks.

[6]  Samuel M. McClure,et al.  A computational substrate for incentive salience , 2003, Trends in Neurosciences.

[7]  J. Salamone,et al.  Nucleus accumbens dopamine depletions make rats more sensitive to high ratio requirements but do not impair primary food reinforcement , 1999, Neuroscience.

[8]  E. A. Alluisi,et al.  Stimulus- response compatibility and the rate of gain of information , 1964 .

[9]  D. Broadbent,et al.  ON THE INTERACTION OF S-R COMPATIBILITY WITH OTHER VARIABLES AFFECTING REACTION TIME. , 1965, British journal of psychology.

[10]  M. V. Rhoades,et al.  On the Reduction of Choice Reaction Times with Practice , 1959 .

[11]  S. Haber The primate basal ganglia: parallel and integrative networks , 2003, Journal of Chemical Neuroanatomy.

[12]  N. Andén,et al.  A functional effect of dopamine in the nucleus accumbens and in some other dopamine-rich parts of the rat brain , 1975, Psychopharmacologia.

[13]  Shlomo Zilberstein,et al.  Approximate Reasoning Using Anytime Algorithms , 1995 .

[14]  F. Gonzalez-Lima,et al.  VICARIOUS TRIAL-AND-ERROR BEHAVIOR AND HIPPOCAMPAL CYTOCHROME OXIDASE ACTIVITY DURING Y-MAZE DISCRIMINATION LEARNING IN THE RAT , 2006, The International journal of neuroscience.

[15]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[16]  S. Killcross,et al.  Coordination of actions and habits in the medial prefrontal cortex of rats. , 2003, Cerebral cortex.

[17]  J. Gold,et al.  The neural basis of decision making. , 2007, Annual review of neuroscience.

[18]  Shlomo Zilberstein,et al.  Models of Bounded Rationality , 1995 .

[19]  W. E. Hick Quarterly Journal of Experimental Psychology , 1948, Nature.

[20]  R. Rescorla,et al.  Instrumental responding remains sensitive to reinforcer devaluation after extensive training , 1985 .

[21]  P. Holland Relations between Pavlovian-instrumental transfer and reinforcer devaluation. , 2004, Journal of experimental psychology. Animal behavior processes.

[22]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[23]  N. White,et al.  Effects of systemic and intracranial amphetamine injections on behavior in the open field: A detailed analysis , 1987, Pharmacology Biochemistry and Behavior.

[24]  T. Robbins,et al.  Increased response switching, perseveration and perseverative switching following d-amphetamine in the rat , 2004, Psychopharmacology.

[25]  M. F. Brown,et al.  Does a cognitive map guide choices in the radial-arm maze? , 1992, Journal of experimental psychology. Animal behavior processes.

[26]  R. Rescorla,et al.  The role of response-reinforcer associations increases throughout extended instrumental training , 1988 .

[27]  B. Balleine,et al.  Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.

[28]  T. Robbins,et al.  Enhanced behavioural control by conditioned reinforcers following microinjections of d-amphetamine into the nucleus accumbens , 2004, Psychopharmacology.

[29]  Jeffrey K. Uhlmann,et al.  Unscented filtering and nonlinear estimation , 2004, Proceedings of the IEEE.

[30]  I M SPIGEL LIFT REACTION TIME AND TOPOGRAPHIC COMPATIBILITY OF THE S-R FIELD. , 1965, The Journal of general psychology.

[31]  Jadin C. Jackson,et al.  Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. , 2007, Psychological review.

[32]  Gerd Gigerenzer,et al.  Adaptive Thinking: Rationality in the Real World , 2000 .

[33]  Michael S Landy,et al.  Combining Priors and Noisy Visual Cues in a Rapid Pointing Task , 2006, The Journal of Neuroscience.

[34]  Nikolaus R. McFarland,et al.  Striatonigrostriatal Pathways in Primates Form an Ascending Spiral from the Shell to the Dorsolateral Striatum , 2000, The Journal of Neuroscience.

[35]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[36]  P. Todd,et al.  Simple Heuristics That Make Us Smart , 1999 .

[37]  T. Robbins,et al.  Functions of dopamine in the dorsal and ventral striatum , 1992 .

[38]  R. Hyman Stimulus information as a determinant of reaction time. , 1953, Journal of experimental psychology.

[39]  B. Balleine,et al.  Motivational control of heterogeneous instrumental chains. , 1995 .

[40]  R. Mahurin,et al.  Application of Hick's Law of Response Speed in Alzheimer and Parkinson Diseases , 1993, Perceptual and motor skills.

[41]  Adam Johnson,et al.  Computing motivation: Incentive salience boosts of drug or appetite states , 2008, Behavioral and Brain Sciences.

[42]  Alan L. Yuille,et al.  Winner-take-all mechanisms , 1998 .

[43]  C. T. Morgan,et al.  Handbook of psychological research on the rat. , 1951 .

[44]  A. Welford Choice reaction time: Basic concepts , 1980 .

[45]  Spigel Im LIFT REACTION TIME AND TOPOGRAPHIC COMPATIBILITY OF THE S-R FIELD. , 1965 .

[46]  J. Salamone,et al.  The Role of Accumbens Dopamine in Lever Pressing and Response Allocation: Effects of 6-OHDA Injected into Core and Dorsomedial Shell , 1998, Pharmacology Biochemistry and Behavior.

[47]  E. Tolman The determiners of behavior at a choice point. , 1938 .

[48]  A. Mas-Colell,et al.  Microeconomic Theory , 1995 .

[49]  A. Cooper,et al.  Predictive Reward Signal of Dopamine Neurons , 2011 .

[50]  H. Simon,et al.  A Behavioral Model of Rational Choice , 1955 .

[51]  B. Shiv,et al.  Heart and Mind in Conflict: The Interplay of Affect and Cognition in Consumer Decision Making , 1999 .

[52]  Robert S Sherwin,et al.  Blood glucose and the brain in diabetes: Between a rock and a hard place? , 2002, Current diabetes reports.

[53]  K. F. Muenzinger Vicarious Trial and Error at a Point of Choice: I. A General Survey of its Relation to Learning Efficiency , 1938 .

[54]  David S. Touretzky,et al.  Long-Term Reward Prediction in TD Models of the Dopamine System , 2002, Neural Computation.

[55]  E A ALLUISI,et al.  Interaction of S-R Compatibility and the Rate of Gain of Information , 1965, Perceptual and motor skills.

[56]  Richard S. J. Frackowiak,et al.  Anatomy of motor learning. I. Frontal cortex and attention to action. , 1997, Journal of neurophysiology.

[57]  E. Tolman Prediction of vicarious trial and error by means of the schematic sowbug. , 1939 .

[58]  R. Suri Anticipatory responses of dopamine neurons and cortical neurons reproduced by internal model , 2001, Experimental Brain Research.

[59]  T. Ljungberg,et al.  Disruptive effects of low doses of d-amphetamine on the ability of rats to organize behaviour into functional sequences , 2004, Psychopharmacology.

[60]  A. Neuringer,et al.  Pigeon reaction time, Hick’s law, and intelligence , 2000, Psychonomic bulletin & review.

[61]  Colin Camerer,et al.  A framework for studying the neurobiology of value-based decision making , 2008, Nature Reviews Neuroscience.

[62]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[63]  T. Robbins,et al.  6-Hydroxydopamine lesions of the nucleus accumbens, but not of the caudate nucleus, attenuate enhanced responding with reward-related stimuli produced by intra-accumbens d-amphetamine , 2004, Psychopharmacology.

[64]  A. Dickinson,et al.  Choice and contingency in the development of behavioral autonomy during instrumental conditioning. , 2010, Journal of experimental psychology. Animal behavior processes.

[65]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[66]  Ronald A. Howard,et al.  Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..

[67]  R. Buckner,et al.  Self-projection and the brain , 2007, Trends in Cognitive Sciences.

[68]  B. Balleine,et al.  Motivational Control of Instrumental Action , 1995 .

[69]  J. Salamone,et al.  Nucleus accumbens dopamine depletions make animals highly sensitive to high fixed ratio requirements but do not impair primary food reinforcement , 2001, Neuroscience.

[70]  S. Grossberg,et al.  Pattern formation, contrast control, and oscillations in the short term memory of shunting on-center off-surround networks , 1975, Biological Cybernetics.

[71]  B. Balleine,et al.  Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning , 2004, The European journal of neuroscience.

[72]  Richard S. J. Frackowiak,et al.  Anatomy of motor learning. II. Subcortical structures and learning by trial and error. , 1997, Journal of neurophysiology.

[73]  Christopher D. Adams Variations in the Sensitivity of Instrumental Responding to Reinforcer Devaluation , 1982 .

[74]  Matthieu Geist,et al.  Kalman Temporal Differences: The deterministic case , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[75]  J. Salamone,et al.  Nucleus accumbens dopamine and work requirements on interval schedules , 2002, Behavioural Brain Research.

[76]  Mathias Pessiglione,et al.  An Effect of Dopamine Depletion on Decision-making: The Temporal Coupling of Deliberation and Execution , 2005, Journal of Cognitive Neuroscience.

[77]  A. Dickinson,et al.  Parallel and interactive learning processes within the basal ganglia: Relevance for the understanding of addiction , 2009, Behavioural Brain Research.

[78]  J. Salamone,et al.  Ratio and time requirements on operant schedules: effort‐related effects of nucleus accumbens dopamine depletions , 2005, The European journal of neuroscience.

[79]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.