Decision theory, reinforcement learning, and the brain

Decision making is a core competence for animals and humans acting and surviving in environments they only partially comprehend, gaining rewards and punishments for their troubles. Decision-theoretic concepts permeate experiments and computational models in ethology, psychology, and neuroscience. Here, we review a well-known, coherent Bayesian approach to decision making, showing how it unifies issues in Markovian decision problems, signal detection psychophysics, sequential sampling, and optimal exploration and discuss paradigmatic psychological and neural examples of each problem. We discuss computational issues concerning what subjects know about their task and how ambitious they are in seeking optimal solutions; we address algorithmic topics concerning model-based and model-free methods for making choices; and we highlight key aspects of the neural implementation of decision making.

[1]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[2]  J. Andel Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[3]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  P. Taylor,et al.  Test of optimal sampling by foraging great tits , 1978 .

[6]  A Houston,et al.  The application of statistical decision theory to animal behaviour. , 1980, Journal of theoretical biology.

[7]  James O. Berger Statistical Decision Theory , 1980 .

[8]  Sheldon M. Ross,et al.  Introduction to Stochastic Dynamic Programming: Probability and Mathematical , 1983 .

[9]  G. Pyke Optimal Foraging Theory: A Critical Review , 1984 .

[10]  Donald A. Berry,et al.  Bandit Problems: Sequential Allocation of Experiments. , 1986 .

[11]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[12]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[13]  C. Clark,et al.  Dynamic Modeling in Behavioral Ecology , 2019 .

[14]  C. Watkins Learning from delayed rewards , 1989 .

[15]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[16]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[17]  J. Movshon,et al.  The analysis of visual motion: a comparison of neuronal and psychophysical performance , 1992, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[18]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[19]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[20]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[21]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[22]  Karl J. Friston,et al.  Value-dependent selection in the brain: Simulation in a synthetic neural model , 1994, Neuroscience.

[23]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[24]  A. Barto,et al.  Adaptive Critics and the Basal Ganglia , 1994 .

[25]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[26]  Joel L. Davis,et al.  Adaptive Critics and the Basal Ganglia , 1995 .

[27]  A. Yuille,et al.  Bayesian decision theory and psychophysics , 1996 .

[28]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[29]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[30]  J. Movshon,et al.  A computational analysis of the relationship between neuronal and behavioral responses to visual motion , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[31]  M N Shadlen,et al.  Motion perception: seeing and deciding. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[32]  K. H. Britten,et al.  A relationship between behavioral choice and the visual responses of neurons in macaque MT , 1996, Visual Neuroscience.

[33]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[34]  W. Schultz,et al.  Learning of sequential movements by neural network model with dopamine-like reinforcement signal , 1998, Experimental Brain Research.

[35]  A. Parker,et al.  Sense and the single neuron: probing the physiology of perception. , 1998, Annual review of neuroscience.

[36]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[37]  Jeffrey N. Rouder,et al.  Modeling Response Times for Two-Choice Decisions , 1998 .

[38]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[39]  Alexandre Pouget,et al.  Probabilistic Interpretation of Population Codes , 1996, Neural Computation.

[40]  Michael L. Platt,et al.  Neural correlates of decision variables in parietal cortex , 1999, Nature.

[41]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[42]  R. Jacobs,et al.  Optimal integration of texture and motion cues to depth , 1999, Vision Research.

[43]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[44]  James L. McClelland,et al.  The time course of perceptual choice: the leaky, competing accumulator model. , 2001, Psychological review.

[45]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[46]  J. Gold,et al.  Neural computations that underlie decisions about sensory stimuli , 2001, Trends in Cognitive Sciences.

[47]  M. Ernst,et al.  Humans integrate visual and haptic information in a statistically optimal fashion , 2002, Nature.

[48]  W. Schultz Getting Formal with Dopamine and Reward , 2002, Neuron.

[49]  J. Gold,et al.  Banburismus and the Brain Decoding the Relationship between Sensory Stimuli, Decisions, and Reward , 2002, Neuron.

[50]  Xiao-Jing Wang,et al.  Probabilistic Decision Making by Slow Reverberation in Cortical Circuits , 2002, Neuron.

[51]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[52]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[53]  M. El-Sabaawi Breakdown of Will , 2002 .

[54]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[55]  M. Shadlen,et al.  Response of Neurons in the Lateral Intraparietal Area during a Combined Visual Discrimination Reaction Time Task , 2002, The Journal of Neuroscience.

[56]  Peter Dayan,et al.  Dopamine: generalization and bonuses , 2002, Neural Networks.

[57]  Robert A Jacobs,et al.  Bayesian integration of visual and auditory signals for spatial localization. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[58]  P. Glimcher Decisions, Uncertainty, and the Brain: The Science of Neuroeconomics , 2003 .

[59]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[60]  Michael S Landy,et al.  Statistical decision theory and the selection of rapid, goal-directed movements. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[61]  M. Landy,et al.  Statistical decision theory and trade-offs in the control of motor response. , 2003, Spatial vision.

[62]  A. Bechara Decisions, Uncertainty, and the Brain: The Science of Neuroeconomics , 2003 .

[63]  S. Killcross,et al.  Coordination of actions and habits in the medial prefrontal cortex of rats. , 2003, Cerebral cortex.

[64]  Peter Dayan,et al.  Doubly Distributional Population Codes: Simultaneous Representation of Uncertainty and Multiplicity , 2003, Neural Computation.

[65]  Terrence J. Sejnowski,et al.  Exploration Bonuses and Dual Control , 1996, Machine Learning.

[66]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[67]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[68]  Philip L. Smith,et al.  Psychology and neurobiology of simple decisions , 2004, Trends in Neurosciences.

[69]  Samuel M. McClure,et al.  Separate Neural Systems Value Immediate and Delayed Monetary Rewards , 2004, Science.

[70]  Rajesh P. N. Rao Bayesian Computation in Recurrent Neural Circuits , 2004, Neural Computation.

[71]  K. Doya,et al.  A Neural Correlate of Reward-Based Behavioral Learning in Caudate Nucleus: A Functional Magnetic Resonance Imaging Study of a Stochastic Decision Task , 2004, The Journal of Neuroscience.

[72]  Peter Dayan,et al.  Temporal difference models describe higher-order learning in humans , 2004, Nature.

[73]  Konrad Paul Kording,et al.  Bayesian integration in sensorimotor learning , 2004, Nature.

[74]  Jonathan D. Cohen,et al.  An exploration-exploitation model based on norepinepherine and dopamine activity , 2005, NIPS.

[75]  T. Robbins,et al.  Neural systems of reinforcement for drug addiction: from actions to habits to compulsion , 2005, Nature Neuroscience.

[76]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[77]  J. Wickens,et al.  Striatal dopamine in motor activation and reward-mediated learning: steps towards a unifying model , 2005, Journal of Neural Transmission / General Section JNT.

[78]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[79]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[80]  S. Hyman,et al.  Neural mechanisms of addiction: the role of reward-related learning and memory. , 2006, Annual review of neuroscience.

[81]  K. Berridge The debate over dopamine’s role in reward: the case for incentive salience , 2007, Psychopharmacology.

[82]  Anthony J. Movshon,et al.  Optimal representation of sensory information by neural populations , 2006, Nature Neuroscience.

[83]  Wei Ji Ma,et al.  Bayesian inference with probabilistic population codes , 2006, Nature Neuroscience.

[84]  Eero P. Simoncelli,et al.  Noise characteristics and prior expectations in human visual speed perception , 2006, Nature Neuroscience.

[85]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[86]  X-J Wang,et al.  Toward a Prefrontal Microcircuit Model for Cognitive Deficits in Schizophrenia , 2006, Pharmacopsychiatry.

[87]  Angela J. Yu Optimal Change-Detection and Spiking Neurons , 2006, NIPS.

[88]  M. Landy,et al.  Humans Rapidly Estimate Expected Gain in Movement Planning , 2006, Psychological science.

[89]  Richard E. Turner,et al.  Probabilistic Population Codes , 2006 .

[90]  Jonathan D. Cohen,et al.  The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. , 2006, Psychological review.

[91]  S. Ishii,et al.  Resolution of Uncertainty in Prefrontal Cortex , 2006, Neuron.

[92]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[93]  K. Doya,et al.  The computational neurobiology of learning and reward , 2006, Current Opinion in Neurobiology.

[94]  Xiao-Jing Wang,et al.  Cortico–basal ganglia circuit mechanism for a decision threshold in reaction time tasks , 2006, Nature Neuroscience.

[95]  B. Balleine,et al.  The Role of the Dorsal Striatum in Reward and Decision-Making , 2007, The Journal of Neuroscience.

[96]  Alexandre Pouget,et al.  Exact Inferences in a Neural Implementation of a Hidden Markov Model , 2007, Neural Computation.

[97]  Michael N. Shadlen,et al.  Probabilistic reasoning by neurons , 2007, Nature.

[98]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[99]  Michael N. Shadlen,et al.  The Speed and Accuracy of a Simple Perceptual Decision: A Mathematical Primer. , 2007 .

[100]  J. Gold,et al.  The neural basis of decision making. , 2007, Annual review of neuroscience.

[101]  A. Pouget,et al.  Probabilistic population codes and the exponential family of distributions. , 2007, Progress in brain research.

[102]  J. Horvitz,et al.  Dopaminergic Mechanisms in Actions and Habits , 2007, The Journal of Neuroscience.

[103]  R. Costa Plastic Corticostriatal Circuits for Action Learning , 2007, Annals of the New York Academy of Sciences.

[104]  R. Wightman,et al.  Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens , 2007, Nature Neuroscience.

[105]  Peter Dayan,et al.  Hippocampal Contributions to Control: The Third Way , 2007, NIPS.

[106]  P. Glimcher,et al.  The neural correlates of subjective value during intertemporal choice , 2007, Nature Neuroscience.

[107]  Konrad Paul Kording,et al.  Decision Theory: What "Should" the Nervous System Do? , 2007, Science.

[108]  M. Roesch,et al.  Should I Stay or Should I Go? , 2007 .

[109]  Roger Ratcliff,et al.  The Diffusion Decision Model: Theory and Data for Two-Choice Decision Tasks , 2008, Neural Computation.

[110]  P. Redgrave,et al.  What is reinforced by phasic dopamine signals? , 2008, Brain Research Reviews.

[111]  Sophie Denève,et al.  Bayesian Spiking Neurons I: Inference , 2008, Neural Computation.

[112]  M. Sahani,et al.  Implicit knowledge of visual uncertainty guides decisions with asymmetric outcomes. , 2008, Journal of vision.

[113]  N. Daw,et al.  Striatal Activity Underlies Novelty-Based Choice in Humans , 2008, Neuron.

[114]  J. Crotts Why Choose this Book? How we make decisions , 2008 .