Goal-Directed Decision Making with Spiking Neurons

Behavioral and neuroscientific data on reward-based decision making point to a fundamental distinction between habitual and goal-directed action selection. The formation of habits, which requires simple updating of cached values, has been studied in great detail, and the reward prediction error theory of dopamine function has enjoyed prominent success in accounting for its neural bases. In contrast, the neural circuit mechanisms of goal-directed decision making, requiring extended iterative computations to estimate values online, are still unknown. Here we present a spiking neural network that provably solves the difficult online value estimation problem underlying goal-directed decision making in a near-optimal way and reproduces behavioral as well as neurophysiological experimental data on tasks ranging from simple binary choice to sequential decision making. Our model uses local plasticity rules to learn the synaptic weights of a simple neural network to achieve optimal performance and solves one-step decision-making tasks, commonly considered in neuroeconomics, as well as more challenging sequential decision-making tasks within 1 s. These decision times, and their parametric dependence on task parameters, as well as the final choice probabilities match behavioral data, whereas the evolution of neural activities in the network closely mimics neural responses recorded in frontal cortices during the execution of such tasks. Our theory provides a principled framework to understand the neural underpinning of goal-directed decision making and makes novel predictions for sequential decision-making tasks with multiple rewards. SIGNIFICANCE STATEMENT Goal-directed actions requiring prospective planning pervade decision making, but their circuit-level mechanisms remain elusive. We show how a model circuit of biologically realistic spiking neurons can solve this computationally challenging problem in a novel way. The synaptic weights of our network can be learned using local plasticity rules such that its dynamics devise a near-optimal plan of action. By systematically comparing our model results to experimental data, we show that it reproduces behavioral decision times and choice probabilities as well as neural responses in a rich set of tasks. Our results thus offer the first biologically realistic account for complex goal-directed decision making at a computational, algorithmic, and implementational level.

[1]  Eero P. Simoncelli,et al.  Spatio-temporal correlations and visual signalling in a complete neuronal population , 2008, Nature.

[2]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[3]  P. Berkes,et al.  Statistically Optimal Perception and Learning: from Behavior to Neural Representations , 2022 .

[4]  József Fiser,et al.  Spontaneous Cortical Activity Reveals Hallmarks of an Optimal Internal Model of the Environment , 2011, Science.

[5]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[6]  R. Bellman Dynamic programming. , 1957, Science.

[7]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[8]  Nestor A. Schmajuk,et al.  Purposive behavior and cognitive mapping: a neural network model , 1992, Biological Cybernetics.

[9]  Wulfram Gerstner,et al.  Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons , 2013, PLoS Comput. Biol..

[10]  Walter Senn,et al.  Code-Specific Learning Rules Improve Action Selection by Populations of Spiking Neurons , 2014, Int. J. Neural Syst..

[11]  Peter Dayan,et al.  Optimal Recall from Bounded Metaplastic Synapses: Predicting Functional Adaptations in Hippocampal Area CA3 , 2014, PLoS Comput. Biol..

[12]  Rafal Bogacz,et al.  Optimal Decision Making on the Basis of Evidence Represented in Spike Trains , 2010, Neural Computation.

[13]  Peter Dayan,et al.  Temporal difference models describe higher-order learning in humans , 2004, Nature.

[14]  Walter Senn,et al.  Spatio-Temporal Credit Assignment in Neuronal Population Learning , 2011, PLoS Comput. Biol..

[15]  Michael E. Hasselmo,et al.  A Model of Prefrontal Cortical Mechanisms for Goal-directed Behavior , 2005, Journal of Cognitive Neuroscience.

[16]  Wulfram Gerstner,et al.  Predicting spike timing of neocortical pyramidal neurons by simple threshold models , 2006, Journal of Computational Neuroscience.

[17]  Rajesh P. N. Rao Hierarchical Bayesian Inference in Networks of Spiking Neurons , 2004, NIPS.

[18]  Jonathan D. Cohen,et al.  The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. , 2006, Psychological review.

[19]  Carl E. Rasmussen,et al.  Gaussian process dynamic programming , 2009, Neurocomputing.

[20]  Uri T Eden,et al.  A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. , 2005, Journal of neurophysiology.

[21]  Walter Senn,et al.  Spike-based Decision Learning of Nash Equilibria in Two-Player Games , 2012, PLoS Comput. Biol..

[22]  Alec Solway,et al.  Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. , 2012, Psychological review.

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Wolfgang Maass,et al.  Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons , 2011, PLoS Comput. Biol..

[25]  James L. McClelland,et al.  The time course of perceptual choice: the leaky, competing accumulator model. , 2001, Psychological review.

[26]  Wulfram Gerstner,et al.  Neuronal Dynamics: From Single Neurons To Networks And Models Of Cognition , 2014 .

[27]  Guillaume Hennequin,et al.  Fast Sampling-Based Inference in Balanced Neuronal Networks , 2014, NIPS.

[28]  David S. Touretzky,et al.  The Role of the Hippocampus in Solving the Morris Water Maze , 1998, Neural Computation.

[29]  Markus Diesmann,et al.  An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning , 2011, PLoS Comput. Biol..

[30]  C. Padoa-Schioppa,et al.  Neurons in the orbitofrontal cortex encode economic value , 2006, Nature.

[31]  Bruno B Averbeck,et al.  Parallel processing of serial movements in prefrontal cortex , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[32]  M. Rushworth,et al.  Valuation and decision-making in frontal cortex: one or many serial or parallel systems? , 2012, Current Opinion in Neurobiology.

[33]  D. R. Euston,et al.  Fast-Forward Playback of Recent Memory Sequences in Prefrontal Cortex During Sleep , 2007, Science.

[34]  M. Roesch,et al.  Impact of expected reward on neuronal activity in prefrontal cortex, frontal and supplementary eye fields and premotor cortex. , 2003, Journal of neurophysiology.

[35]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[36]  Marius Usher,et al.  Disentangling decision models: from independence to competition. , 2013, Psychological review.

[37]  Hagai Attias,et al.  Planning by Probabilistic Inference , 2003, AISTATS.

[38]  Xiao-Jing Wang,et al.  The importance of mixed selectivity in complex cognitive tasks , 2013, Nature.

[39]  B. McNaughton,et al.  Preferential Reactivation of Motivationally Relevant Information in the Ventral Striatum , 2008, The Journal of Neuroscience.

[40]  Michael Kearns,et al.  Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.

[41]  G. Buzsáki,et al.  Forward and reverse hippocampal place-cell sequences during ripples , 2007, Nature Neuroscience.

[42]  J. Tanji,et al.  Activity in the Lateral Prefrontal Cortex Reflects Multiple Steps of Future Events in Action Plans , 2006, Neuron.

[43]  P. Dayan,et al.  Mapping value based planning and extensively trained choice in the human brain , 2012, Nature Neuroscience.

[44]  Xiao-Jing Wang,et al.  A Recurrent Network Mechanism of Time Integration in Perceptual Decisions , 2006, The Journal of Neuroscience.

[45]  B. McNaughton,et al.  The Ventral Striatum in Off-Line Processing: Ensemble Reactivation during Sleep and Modulation by Hippocampal Ripples , 2004, The Journal of Neuroscience.

[46]  Matthew Botvinick,et al.  Goal-directed decision making in prefrontal cortex: a computational framework , 2008, NIPS.

[47]  Matthijs A. A. van der Meer,et al.  Hippocampal Replay Is Not a Simple Function of Experience , 2010, Neuron.

[48]  E J Chichilnisky,et al.  Prediction and Decoding of Retinal Ganglion Cell Responses with a Probabilistic Spiking Model , 2005, The Journal of Neuroscience.

[49]  Daeyeol Lee,et al.  Order-Dependent Modulation of Directional Signals in the Supplementary and Presupplementary Motor Areas , 2007, The Journal of Neuroscience.

[50]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[51]  D. Wolpert,et al.  Changing your mind: a computational mechanism of vacillation , 2009, Nature.

[52]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[53]  C. Padoa-Schioppa,et al.  Multi-stage mental process for economic choice in capuchins , 2006, Cognition.

[54]  R. Dolan,et al.  Confidence in value-based choice , 2012, Nature Neuroscience.

[55]  M. Hasselmo,et al.  An integrate-and-fire model of prefrontal cortex neuronal activity during performance of goal-directed decision making. , 2005, Cerebral cortex.

[56]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[57]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[58]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[59]  B L McNaughton,et al.  Coordinated Reactivation of Distributed Memory Traces in Primate Neocortex , 2002, Science.

[60]  Henning Sprekeler,et al.  Functional Requirements for Reward-Modulated Spike-Timing-Dependent Plasticity , 2010, The Journal of Neuroscience.

[61]  P. Dayan,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.8 Full text provided by www.sciencedirect.com A normative perspective on motivation , 2022 .

[62]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[63]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[64]  C. Padoa-Schioppa Neuronal Origins of Choice Variability in Economic Decisions , 2013, Neuron.

[65]  P. Dayan,et al.  Synapses with short-term plasticity are optimal estimators of presynaptic membrane potentials , 2010, Nature Neuroscience.

[66]  D. McCormick,et al.  Comparative electrophysiology of pyramidal and sparsely spiny stellate neurons of the neocortex. , 1985, Journal of neurophysiology.

[67]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[68]  M. Botvinick,et al.  Planning as inference , 2012, Trends in Cognitive Sciences.

[69]  S. Denéve,et al.  Neural processing as causal inference , 2011, Current Opinion in Neurobiology.

[70]  David J. Foster,et al.  Reverse replay of behavioural sequences in hippocampal place cells during the awake state , 2006, Nature.

[71]  Daeyeol Lee,et al.  Beyond working memory: the role of persistent activity in decision making , 2010, Trends in Cognitive Sciences.

[72]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[73]  Tommy C. Blanchard,et al.  Reward Value Comparison via Mutual Inhibition in Ventromedial Prefrontal Cortex , 2014, Neuron.

[74]  Angelo Arleo,et al.  Spatial Learning and Action Planning in a Prefrontal Cortical Network Model , 2011, PLoS Comput. Biol..

[75]  Philip L. Smith,et al.  Psychology and neurobiology of simple decisions , 2004, Trends in Neurosciences.

[76]  C. Kennard,et al.  Functional role of the supplementary and pre-supplementary motor areas , 2008, Nature Reviews Neuroscience.

[77]  M. Khamassi,et al.  Replay of rule-learning related neural patterns in the prefrontal cortex during sleep , 2009, Nature Neuroscience.

[78]  Walter Senn,et al.  Learning Spike-Based Population Codes by Reward and Population Feedback , 2010, Neural Computation.

[79]  Wolfgang Maass,et al.  A Reward-Modulated Hebbian Learning Rule Can Explain Experimentally Observed Network Reorganization in a Brain Control Task , 2010, The Journal of Neuroscience.

[80]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.