Navigating complex decision spaces: Problems and paradigms in sequential choice.

To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult when the consequences of an action follow a delay. This introduces the problem of temporal credit assignment. When feedback follows a sequence of decisions, how should the individual assign credit to the intermediate actions that comprise the sequence? Research in reinforcement learning provides 2 general solutions to this problem: model-free reinforcement learning and model-based reinforcement learning. In this review, we examine connections between stimulus-response and cognitive learning theories, habitual and goal-directed control, and model-free and model-based reinforcement learning. We then consider a range of problems related to temporal credit assignment. These include second-order conditioning and secondary reinforcers, latent learning and detour behavior, partially observable Markov decision processes, actions with distributed outcomes, and hierarchical learning. We ask whether humans and animals, when faced with these problems, behave in a manner consistent with reinforcement learning techniques. Throughout, we seek to identify neural substrates of model-free and model-based reinforcement learning. The former class of techniques is understood in terms of the neurotransmitter dopamine and its effects in the basal ganglia. The latter is understood in terms of a distributed network of regions including the prefrontal cortex, medial temporal lobes, cerebellum, and basal ganglia. Not only do reinforcement learning techniques have a natural interpretation in terms of human and animal behavior but they also provide a useful framework for understanding neural reward valuation and action selection.

[1]  B. Skinner,et al.  The Behavior of Organisms: An Experimental Analysis , 2016 .

[2]  J. Karbach,et al.  Making Working Memory Work , 2014, Psychological science.

[3]  A. Markman,et al.  Journal of Experimental Psychology : General Retrospective Revaluation in Sequential Decision Making : A Tale of Two Systems , 2012 .

[4]  Hansjörg Neth,et al.  Melioration as rational choice: sequential decision making in uncertain environments. , 2013, Psychological review.

[5]  John R. Anderson,et al.  Learning from experience: Event-related potential correlates of reward processing, neural adaptation, and behavioral choice , 2012, Neuroscience & Biobehavioral Reviews.

[6]  Y. Niv,et al.  Exploring a latent cause theory of classical conditioning , 2012, Learning & Behavior.

[7]  S. Ostlund,et al.  Phasic Mesolimbic Dopamine Signaling Precedes and Predicts Performance of a Self-Initiated Action Sequence Task , 2012, Biological Psychiatry.

[8]  P. Dayan Instrumental vigour in punishment and reward , 2012, The European journal of neuroscience.

[9]  Christian P. Janssen,et al.  When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition , 2012, Cogn. Sci..

[10]  Peter Dayan,et al.  Bonsai Trees in Your Head: How the Pavlovian System Sculpts Goal-Directed Choices by Pruning Decision Trees , 2012, PLoS Comput. Biol..

[11]  P. Dayan,et al.  Mapping value based planning and extensively trained choice in the human brain , 2012, Nature Neuroscience.

[12]  Dirk Ifenthaler,et al.  Stochastic Models of Learning , 2012 .

[13]  Alec Solway,et al.  Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. , 2012, Psychological review.

[14]  Catherine J. Stoodley,et al.  The Cerebellum and Cognition: Evidence from Functional Imaging Studies , 2011, The Cerebellum.

[15]  John R. Anderson,et al.  Modulation of the feedback-related negativity by instruction and experience , 2011, Proceedings of the National Academy of Sciences.

[16]  Hong Li,et al.  This ought to be good: brain activity accompanying positive and negative expectations and outcomes. , 2011, Psychophysiology.

[17]  Joseph T. McGuire,et al.  A Neural Signature of Hierarchical Reinforcement Learning , 2011, Neuron.

[18]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[19]  Dylan A. Simon,et al.  Neural Correlates of Forward Planning in a Spatial Decision Task in Humans , 2011, The Journal of Neuroscience.

[20]  Clay B. Holroyd,et al.  Reward positivity elicited by predictive cues , 2011, Neuroreport.

[21]  John R. Anderson,et al.  Learning from delayed feedback: neural responses in temporal credit assignment , 2011, Cognitive, affective & behavioral neuroscience.

[22]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[23]  Mimi Liljeholm,et al.  Neural Correlates of Instrumental Contingency Learning: Differential Effects of Action–Reward Conjunction and Disjunction , 2011, The Journal of Neuroscience.

[24]  Y. Niv,et al.  Ventral Striatum and Orbitofrontal Cortex Are Both Required for Model-Based, But Not Model-Free, Reinforcement Learning , 2011, The Journal of Neuroscience.

[25]  M. Frank,et al.  From reinforcement learning models to psychiatric and neurological disorders , 2011, Nature Neuroscience.

[26]  A. Cooper,et al.  Predictive Reward Signal of Dopamine Neurons , 2011 .

[27]  Rajesh P. N. Rao,et al.  Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes , 2010, Front. Comput. Neurosci..

[28]  Makoto Ito,et al.  Evidence for Model-Based Action Planning in a Sequential Finger Movement Task , 2010, Journal of motor behavior.

[29]  H. Bergman,et al.  Goal-directed and habitual control in the basal ganglia: implications for Parkinson's disease , 2010, Nature Reviews Neuroscience.

[30]  P. Glimcher,et al.  Testing the Reward Prediction Error Hypothesis with an Axiomatic Model , 2010, The Journal of Neuroscience.

[31]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[32]  J. O'Doherty,et al.  Human Medial Orbitofrontal Cortex Is Recruited during Experience of Imagined and Real Rewards Prescan Training , 2022 .

[33]  C. Lebiere,et al.  Conditional routing of information to the cortex: a model of the basal ganglia's role in cognitive coordination. , 2010, Psychological review.

[34]  T. Maia Two-factor theory, the actor-critic model, and conditioned avoidance , 2010, Learning & behavior.

[35]  B. Balleine,et al.  Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.

[36]  D. Blei,et al.  Context, learning, and extinction. , 2010, Psychological review.

[37]  Saori C. Tanaka,et al.  Serotonin Affects Association of Aversive Outcomes to Past Actions , 2009, The Journal of Neuroscience.

[38]  Bradley C. Love,et al.  Short-term gains, long-term pains: How cues about state aid learning in dynamic environments , 2009, Cognition.

[39]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[40]  Tiago V. Maia,et al.  Fear Conditioning and Social Groups: Statistics, Not Genetics , 2009, Cogn. Sci..

[41]  P. Strick,et al.  Cerebellum and nonmotor function. , 2009, Annual review of neuroscience.

[42]  B. Balleine,et al.  A specific role for posterior dorsolateral striatum in human habit learning , 2009, The European journal of neuroscience.

[43]  Timothy J. Pleskac,et al.  Theoretical tools for understanding and aiding dynamic decision making , 2009 .

[44]  Y. Niv Reinforcement learning in the brain , 2009 .

[45]  John R. Anderson,et al.  The strategic nature of changing your mind , 2009, Cognitive Psychology.

[46]  M. Kawato,et al.  Brain mechanisms for predictive control by switching internal models: implications for higher-order cognitive functions , 2009, Psychological research.

[47]  J. Gläscher,et al.  Determining a role for ventromedial prefrontal cortex in encoding action-based value signals during reward-related decision making. , 2009, Cerebral cortex.

[48]  Matthijs A. A. van der Meer,et al.  Frontiers in Integrative Neuroscience Integrative Neuroscience Covert Expectation-of-reward in Rat Ventral Striatum at Decision Points , 2022 .

[49]  Bing Liu,et al.  Conditional Routing , 2009, Encyclopedia of Database Systems.

[50]  C. Holroyd,et al.  Which way do I go? Neural activation in response to feedback and spatial processing in a virtual T-maze. , 2009, Cerebral cortex.

[51]  Jonathan D. Cohen,et al.  Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement , 2008, NIPS.

[52]  P. Dayan,et al.  Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.

[53]  Charles Kemp,et al.  The discovery of structural form , 2008, Proceedings of the National Academy of Sciences.

[54]  John R. Anderson,et al.  The acquisition of robust and flexible cognitive skills. , 2008, Journal of experimental psychology. General.

[55]  W. Schultz,et al.  Influence of Reward Delays on Responses of Dopamine Neurons , 2008, The Journal of Neuroscience.

[56]  Adam Johnson,et al.  Computing motivation: Incentive salience boosts of drug or appetite states , 2008, Behavioral and Brain Sciences.

[57]  Colin Camerer,et al.  A framework for studying the neurobiology of value-based decision making , 2008, Nature Reviews Neuroscience.

[58]  B. Balleine,et al.  Calculating Consequences: Brain Systems That Encode the Causal Effects of Actions , 2008, The Journal of Neuroscience.

[59]  John R. Anderson,et al.  Dual learning processes in interactive skill acquisition. , 2008, Journal of experimental psychology. Applied.

[60]  John R. Anderson,et al.  Solving the credit assignment problem: explicit and implicit learning of action sequences with probabilistic outcomes , 2008, Psychological research.

[61]  Saori C. Tanaka,et al.  Low-Serotonin Levels Increase Delayed Reward Discounting in Humans , 2008, The Journal of Neuroscience.

[62]  Masao Ito Control of mental activities by internal models in the cerebellum , 2008, Nature Reviews Neuroscience.

[63]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[64]  M. Roesch,et al.  Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.

[65]  M. Botvinick Conflict monitoring and decision making: Reconciling two perspectives on anterior cingulate function , 2007, Cognitive, affective & behavioral neuroscience.

[66]  Greg Hajcak,et al.  Error-related negativities elicited by monetary loss and cues that predict loss , 2007, Neuroreport.

[67]  Adam Johnson,et al.  Neural Ensembles in CA3 Transiently Encode Paths Forward of the Animal at a Decision Point , 2007, The Journal of Neuroscience.

[68]  J. O'Doherty,et al.  Orbitofrontal Cortex Encodes Willingness to Pay in Everyday Economic Transactions , 2007, The Journal of Neuroscience.

[69]  D. Schacter,et al.  Remembering the past to imagine the future: the prospective brain , 2007, Nature Reviews Neuroscience.

[70]  John R. Anderson How Can the Human Mind Occur in the Physical Universe , 2007 .

[71]  Jadin C. Jackson,et al.  Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. , 2007, Psychological review.

[72]  Samuel M. McClure,et al.  Short-term memory traces for action bias in human reinforcement learning , 2007, Brain Research.

[73]  J. O'Doherty,et al.  Model‐Based fMRI and Its Application to Reward Learning and Decision Making , 2007, Annals of the New York Academy of Sciences.

[74]  Vivian V. Valentin,et al.  Determining the Neural Substrates of Goal-Directed Learning in the Human Brain , 2007, The Journal of Neuroscience.

[75]  Michael X. Cohen,et al.  Behavioral / Systems / Cognitive Reinforcement Learning Signals Predict Future Decisions , 2007 .

[76]  J. Laird,et al.  The Importance of Action History in Decision Making and Reinforcement Learning , 2007 .

[77]  K. Berridge The debate over dopamine’s role in reward: the case for incentive salience , 2007, Psychopharmacology.

[78]  B. Reynolds A review of delay-discounting research with humans: relations to drug use and gambling , 2006, Behavioural pharmacology.

[79]  J. O'Doherty,et al.  The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans , 2006, The Journal of Neuroscience.

[80]  Keiji Tanaka,et al.  Prefrontal Cell Activities Related to Monkeys' Success and Failure in Adapting to Rule Changes in a Wisconsin Card Sorting Test Analog , 2006, The Journal of Neuroscience.

[81]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[82]  Wayne D. Gray,et al.  The soft constraints hypothesis: a rational analysis approach to resource allocation for interactive behavior. , 2006, Psychological review.

[83]  Aaron C. Courville,et al.  Bayesian theories of conditioning in a changing world , 2006, Trends in Cognitive Sciences.

[84]  Timothy E. J. Behrens,et al.  Optimal decision making and the anterior cingulate cortex , 2006, Nature Neuroscience.

[85]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[86]  Henrik Walter,et al.  Prediction error as a linear function of reward probability is coded in human nucleus accumbens , 2006, NeuroImage.

[87]  S. Ishii,et al.  Resolution of Uncertainty in Prefrontal Cortex , 2006, Neuron.

[88]  Jonathan D. Wallis,et al.  A Comparison of Abstract Rules in the Prefrontal Cortex, Premotor Cortex, Inferior Temporal Cortex, and Striatum , 2006, Journal of Cognitive Neuroscience.

[89]  Gordon E Legge,et al.  Lost in virtual space: studies in human and ideal spatial navigation. , 2006, Journal of experimental psychology. Human perception and performance.

[90]  J. Tanji,et al.  Activity in the Lateral Prefrontal Cortex Reflects Multiple Steps of Future Events in Action Plans , 2006, Neuron.

[91]  John R. Anderson,et al.  From recurrent choice to skill learning: a reinforcement-learning model. , 2006, Journal of experimental psychology. General.

[92]  M. Frank,et al.  Anatomy of a decision: striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. , 2006, Psychological review.

[93]  David J. Foster,et al.  Reverse replay of behavioural sequences in hippocampal place cells during the awake state , 2006, Nature.

[94]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[95]  S. Kapur,et al.  Dopamine, prediction error and associative learning: A model-based account , 2006, Network.

[96]  J. O'Doherty,et al.  Human Neural Learning Depends on Reward Prediction Errors in the Blocking Paradigm , 2005, Journal of neurophysiology.

[97]  B. Balleine Neural bases of food-seeking: Affect, arousal and reward in corticostriatolimbic circuits , 2005, Physiology & Behavior.

[98]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[99]  A. David Redish,et al.  Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model , 2005, Neural Networks.

[100]  L. Green,et al.  Prefrontal brain activity predicts temporally extended decision-making behavior. , 2005, Journal of the experimental analysis of behavior.

[101]  J. Tanji,et al.  Representation of immediate and final behavioral goals in the monkey prefrontal cortex during an instructed delay period. , 2005, Cerebral cortex.

[102]  B. Balleine,et al.  Lesions of Medial Prefrontal Cortex Disrupt the Acquisition But Not the Expression of Goal-Directed Learning , 2005, The Journal of Neuroscience.

[103]  John R. Anderson,et al.  Tracing Problem Solving in Real Time: fMRI Analysis of the Subject-paced Tower of Hanoi , 2005, Journal of Cognitive Neuroscience.

[104]  W. Pan,et al.  Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network , 2005, The Journal of Neuroscience.

[105]  Matthew T. Kaufman,et al.  Distributed Neural Representation of Expected Value , 2005, The Journal of Neuroscience.

[106]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[107]  K Richard Ridderinkhof,et al.  Adaptive Coding , 2005, Science.

[108]  Joshua W. Brown,et al.  Learned Predictions of Error Likelihood in the Anterior Cingulate Cortex , 2005, Science.

[109]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[110]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[111]  S. Bunge How we use rules to select actions: A review of evidence from cognitive neuroscience , 2004, Cognitive, affective & behavioral neuroscience.

[112]  Wai-Tat Fu,et al.  Resolving the paradox of the active user: stable suboptimal performance in interactive tasks , 2004, Cogn. Sci..

[113]  Jonathan D. Cohen,et al.  Computational roles for dopamine in behavioural control , 2004, Nature.

[114]  Jonathan D. Cohen,et al.  The neural basis of error detection: conflict monitoring and the error-related negativity. , 2004, Psychological review.

[115]  E. Murray,et al.  Bilateral Orbital Prefrontal Cortex Lesions in Rhesus Monkeys Disrupt Choices Guided by Both Reward Value and Reward Contingency , 2004, The Journal of Neuroscience.

[116]  J J McDowell,et al.  A computational model of selection by consequences. , 2004, Journal of the experimental analysis of behavior.

[117]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[118]  Rebecca Elliott,et al.  Instrumental responding for rewards is associated with enhanced neuronal response in subcortical reward systems , 2004, NeuroImage.

[119]  Jonathan D. Cohen,et al.  Anterior Cingulate Conflict Monitoring and Adjustments in Control , 2004, Science.

[120]  M. Delgado,et al.  Modulation of Caudate Activity by Action Contingency , 2004, Neuron.

[121]  B. Balleine,et al.  Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning , 2004, The European journal of neuroscience.

[122]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[123]  Daniel Gopher,et al.  Melioration and the Transition from Touch-Typing Training to Everyday Use , 2003, Hum. Factors.

[124]  Tatsuo K Sato,et al.  Correlated Coding of Motivation and Outcome of Decision by Dopamine Neurons , 2003, The Journal of Neuroscience.

[125]  E. Rolls,et al.  Different representations of pleasant and unpleasant odours in the human brain , 2003, The European journal of neuroscience.

[126]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[127]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[128]  S. Killcross,et al.  Coordination of actions and habits in the medial prefrontal cortex of rats. , 2003, Cerebral cortex.

[129]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[130]  M. Delgado,et al.  Dorsal striatum responses to reward and punishment: Effects of valence and magnitude manipulations , 2003, Cognitive, affective & behavioral neuroscience.

[131]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[132]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[133]  P. Montague,et al.  Neural Economics and the Biological Substrates of Valuation , 2002, Neuron.

[134]  D. Shanks,et al.  A re‐examination of melioration and rational choice , 2002 .

[135]  Clay B. Holroyd,et al.  The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. , 2002, Psychological review.

[136]  G. Heyman,et al.  Decision biases and persistent illicit drug use: an experimental study of distributed choice and addiction. , 2002, Drug and alcohol dependence.

[137]  John N. J. Reynolds,et al.  Dopamine-dependent plasticity of corticostriatal synapses , 2002, Neural Networks.

[138]  G. Loewenstein,et al.  Time Discounting and Time Preference: A Critical Review , 2002 .

[139]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[140]  C. Carter,et al.  The Timing of Action-Monitoring Processes in the Anterior Cingulate Cortex , 2002, Journal of Cognitive Neuroscience.

[141]  B. Everitt,et al.  Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex , 2002, Neuroscience & Biobehavioral Reviews.

[142]  P. Montague,et al.  Activity in human ventral striatum locked to errors of reward prediction , 2002, Nature Neuroscience.

[143]  René Boel,et al.  Discrete event dynamic systems: Theory and applications. , 2002 .

[144]  B. Knowlton,et al.  Learning and memory functions of the Basal Ganglia. , 2002, Annual review of neuroscience.

[145]  S P Wise,et al.  The role of ventral and orbital prefrontal cortex in conditional visuomotor learning and strategy use in rhesus monkeys (Macaca mulatta). , 2001, Behavioral neuroscience.

[146]  S. Hyman,et al.  Addiction and the brain: The neurobiology of compulsion and its persistence , 2001, Nature Reviews Neuroscience.

[147]  J. Wickens,et al.  A cellular mechanism of reward-related learning , 2001, Nature.

[148]  W. Schultz,et al.  Dopamine responses comply with basic assumptions of formal learning theory , 2001, Nature.

[149]  M. Botvinick,et al.  Conflict monitoring and cognitive control. , 2001, Psychological review.

[150]  Samuel M. McClure,et al.  Predictability Modulates Human Brain Response to Reward , 2001, The Journal of Neuroscience.

[151]  S. Mobini,et al.  Effects of central 5-hydroxytryptamine depletion on sensitivity to delayed and probabilistic reinforcement , 2000, Psychopharmacology.

[152]  E. Miller,et al.  Task-specific neural activity in the primate prefrontal cortex. , 2000, Journal of neurophysiology.

[153]  J. Tanji,et al.  Neuronal activity in the primate prefrontal cortex in the process of motor selection based on two behavioral rules. , 2000, Journal of neurophysiology.

[154]  Nikolaus R. McFarland,et al.  Striatonigrostriatal Pathways in Primates Form an Ascending Spiral from the Shell to the Dorsolateral Striatum , 2000, The Journal of Neuroscience.

[155]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[156]  K. Doya Metalearning, neuromodulation, and emotion , 2000 .

[157]  Kenji Doya,et al.  What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[158]  S. Wise,et al.  Rule-dependent neuronal activity in the prefrontal cortex , 1999, Experimental Brain Research.

[159]  Bob Remington,et al.  When more means less: factors affecting human self-control in a local versus global choice paradigm , 1999 .

[160]  A. Dickinson,et al.  Omission Learning after Instrumental Pretraining , 1998 .

[161]  John Loch,et al.  Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[162]  M. Botvinick,et al.  Anterior cingulate cortex, error detection, and the online monitoring of performance. , 1998, Science.

[163]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[164]  C. Braun,et al.  Event-Related Brain Potentials Following Incorrect Feedback in a Time-Estimation Task: Evidence for a Generic Neural System for Error Detection , 1997, Journal of Cognitive Neuroscience.

[165]  A. Owen Cognitive planning in humans: Neuropsychological, neuroanatomical and neuropharmacological perspectives , 1997, Progress in Neurobiology.

[166]  W K Bickel,et al.  Impulsive and self-control choices in opioid-dependent patients and non-drug-using control participants: drug and monetary rewards. , 1997, Experimental and clinical psychopharmacology.

[167]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[168]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[169]  J. Wickens,et al.  Dopamine reverses the depression of rat corticostriatal synapses which normally follows high-frequency stimulation of cortex In vitro , 1996, Neuroscience.

[170]  H. Eichenbaum,et al.  Conservation of hippocampal memory function in rats and humans , 1996, Nature.

[171]  Andrew McCallum,et al.  Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.

[172]  B. Balleine,et al.  Motivational control of heterogeneous instrumental chains. , 1995 .

[173]  Howard Rachlin,et al.  Self-control: Beyond commitment. , 1995 .

[174]  J. Hodges Memory, Amnesia and the Hippocampal System , 1995 .

[175]  Adrian M. Owen,et al.  Dopamine-Dependent Frontostriatal Planning Deficits in Early Parkinson's Disease , 1995 .

[176]  D. Meyer,et al.  A Neural System for Error Detection and Compensation , 1993 .

[177]  R. Herrnstein,et al.  Utility maximization and melioration: Internalities in individual choice , 1993 .

[178]  J W Donahoe,et al.  A selectionist approach to reinforcement. , 1993, Journal of the experimental analysis of behavior.

[179]  W. Schultz,et al.  Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[180]  B. Balleine Instrumental performance following a shift in primary motivation depends on incentive learning. , 1992, Journal of experimental psychology. Animal behavior processes.

[181]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[182]  D. Sofge THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .

[183]  R. Herrnstein,et al.  Melioration: A Theory of Distributed Choice , 1991 .

[184]  J. Hohnsbein,et al.  Effects of crossmodal divided attention on late ERP components. II. Error processing in choice reaction tasks. , 1991, Electroencephalography and clinical neurophysiology.

[185]  Scott A. Shappell,et al.  Psychophysiology of N200/N400: A Review and Classification Scheme , 1991 .

[186]  M. Gabriel,et al.  Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[187]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[188]  S. Haber,et al.  Topographic organization of the ventral striatal efferent projections in the rhesus monkey: An anterograde tracing study , 1990, The Journal of comparative neurology.

[189]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[190]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[191]  R. Lubow Latent Inhibition and Conditioned Attention Theory , 1989 .

[192]  T. Shallice Specific impairments of planning. , 1982, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[193]  O. Andy The prefrontal cortex: Anatomy, physiology and neuropsychology of the frontal lobe , 1981 .

[194]  Christopher D. Adams,et al.  Instrumental Responding following Reinforcer Devaluation , 1981 .

[195]  B. Murdock,et al.  The role of auditory features in memory span for words. , 1980, Journal of experimental psychology. Human learning and memory.

[196]  M. Eckardt The Hippocampus as a Cognitive Map , 1980 .

[197]  A. D. D. Groot Thought and Choice in Chess , 1978 .

[198]  De Groot,et al.  Thought and choice in chess, 2nd ed. , 1978 .

[199]  R. Luce,et al.  The Choice Axiom after Twenty Years , 1977 .

[200]  G. Stratton University of California publications in psychology , 1976 .

[201]  R. Rescorla,et al.  The effect of two ways of devaluing the unconditioned stimulus after first- and second-order appetitive conditioning. , 1975, Journal of experimental psychology. Animal behavior processes.

[202]  David Krech,et al.  Elements of psychology , 1974 .

[203]  Allen Newell,et al.  Human Problem Solving. , 1973 .

[204]  R. Rescorla,et al.  Associations in second-order conditioning and sensory preconditioning. , 1972, Journal of comparative and physiological psychology.

[205]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[206]  J. Hayes Problem topology and the solution process , 1965 .

[207]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[208]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[209]  Capaldi Ej,et al.  The effect of different amounts of alternating partial reinforcement on resistance to extinction. , 1957 .

[210]  E. Capaldi The effect of different amounts of alternating partial reinforcement on resistance to extinction. , 1957, The American journal of psychology.

[211]  D. Thistlethwaite A critical review of latent learning and related experiments. , 1951, Psychological bulletin.

[212]  C. L. Hull Principles of behavior : an introduction to behavior theory , 1943 .

[213]  W. Brogden Sensory pre-conditioning. , 1939 .

[214]  J. Buel Differential errors in animal mazes. , 1935 .

[215]  S. Spragg Anticipatory responses in the maze. , 1934 .

[216]  K. Spence The order of eliminating blinds in maze learning by the rat. , 1932 .

[217]  C. L. Hull The goal-gradient hypothesis and maze learning. , 1932 .

[218]  F. W. Irwin Purposive Behavior in Animals and Men , 1932, The Psychological Clinic.

[219]  C. H. Honzik,et al.  Degrees of hunger, reward and non-reward, and maze learning in rats, and Introduction and removal of reward, and maze performance in rats , 1930 .

[220]  H. Blodgett,et al.  The effect of the introduction of reward upon the maze performance of rats , 1929 .

[221]  W. B.,et al.  Elements of Psychology , 1908, Nature.

[222]  D. Spalding The Principles of Psychology , 1873, Nature.