A generative spiking neural-network model of goal-directed behaviour and one-step planning

In mammals, goal-directed and planning processes support flexible behaviour used to face new situations that cannot be tackled through more efficient but rigid habitual behaviours. Within the Bayesian modelling approach of brain and behaviour, models have been proposed to perform planning as probabilistic inference but this approach encounters a crucial problem: explaining how such inference might be implemented in brain spiking networks. Recently, the literature has proposed some models that face this problem through recurrent spiking neural networks able to internally simulate state trajectories, the core function at the basis of planning. However, the proposed models have relevant limitations that make them biologically implausible, namely their world model is trained ‘off-line’ before solving the target tasks, and they are trained with supervised learning procedures that are biologically and ecologically not plausible. Here we propose two novel hypotheses on how brain might overcome these problems, and operationalise them in a novel architecture pivoting on a spiking recurrent neural network. The first hypothesis allows the architecture to learn the world model in parallel with its use for planning: to this purpose, a new arbitration mechanism decides when to explore, for learning the world model, or when to exploit it, for planning, based on the entropy of the world model itself. The second hypothesis allows the architecture to use an unsupervised learning process to learn the world model by observing the effects of actions. The architecture is validated by reproducing and accounting for the learning profiles and reaction times of human participants learning to solve a visuomotor learning task that is new for them. Overall, the architecture represents the first instance of a model bridging probabilistic planning and spiking-processes that has a degree of autonomy analogous to the one of real organisms.

[1]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2]  A. Pouget,et al.  Probabilistic brains: knowns and unknowns , 2013, Nature Neuroscience.

[3]  J. Rothwell,et al.  A fronto–striato–subthalamic–pallidal network for goal-directed and habitual inhibition , 2015, Nature Reviews Neuroscience.

[4]  B. Balleine,et al.  Motivational control of goal-directed action , 1994 .

[5]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[6]  Jan Peters,et al.  Recurrent Spiking Networks Solve Planning Tasks , 2016, Scientific Reports.

[7]  B. Love,et al.  The myth of computational level theory and the vacuity of rational analysis , 2011, Behavioral and Brain Sciences.

[8]  Wei Ji Ma,et al.  Efficient Probabilistic Inference in Generic Neural Networks Trained with Non-Probabilistic Feedback , 2018 .

[9]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[10]  E. Rolls,et al.  Computational analysis of the role of the hippocampus in memory , 1994, Hippocampus.

[11]  Tiejun Huang,et al.  Probabilistic Inference of Binary Markov Random Fields in Spiking Neural Networks through Mean-field Approximation , 2020, Neural Networks.

[12]  G. Baldassarre,et al.  Goal-Directed Behavior and Instrumental Devaluation: A Neural System-Level Computational Model , 2016, Front. Behav. Neurosci..

[13]  Juan M. Corchado,et al.  A Survey of Recent Advances in Particle Filters and Remaining Challenges for Multitarget Tracking , 2017, Sensors.

[14]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[15]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[16]  Gianluca Baldassarre,et al.  Forward and Bidirectional Planning Based on Reinforcement Learning and Neural Networks in a Simulated Robot , 2003, ABiALS.

[17]  Wolfgang Maass,et al.  Networks of Spiking Neurons: The Third Generation of Neural Network Models , 1996, Electron. Colloquium Comput. Complex..

[18]  Gianluca Baldassarre,et al.  An Embodied Agent Learning Affordances With Intrinsic Motivations and Solving Extrinsic Tasks With Attention and One-Step Planning , 2019, Front. Neurorobot..

[19]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[20]  W. Gerstner,et al.  Spike-Timing-Dependent Plasticity: A Comprehensive Overview , 2012, Front. Syn. Neurosci..

[21]  Sophie Denève,et al.  Bayesian Inference with Spiking Neurons , 2004, Encyclopedia of Computational Neuroscience.

[22]  Alan MacLennan,et al.  The artificial life route to artificial intelligence: Building embodied, situated agents , 1996 .

[23]  Ben R. Newell,et al.  Unpacking the Exploration–Exploitation Tradeoff: A Synthesis of Human and Animal Literatures , 2015 .

[24]  B. Balleine,et al.  The role of the dorsomedial striatum in instrumental conditioning , 2005, The European journal of neuroscience.

[25]  Wofgang Maas,et al.  Networks of spiking neurons: the third generation of neural network models , 1997 .

[26]  Rajesh P. N. Rao,et al.  Probabilistic Models of the Brain: Perception and Neural Function , 2002 .

[27]  Jan Peters,et al.  Deep spiking networks for model-based planning in humanoids , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[28]  Gianluca Baldassarre,et al.  Planning with neural networks and reinforcement learning , 2001 .

[29]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[30]  Rajesh P. N. Rao,et al.  Neurons as Monte Carlo Samplers: Bayesian Inference and Learning in Spiking Networks , 2014, NIPS.

[31]  M. Newman Complex Systems: A Survey , 2011, 1112.1440.

[32]  Andrea Brovelli,et al.  Dynamic reconfiguration of visuomotor-related functional connectivity networks. , 2016, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[33]  Joseph T. McGuire,et al.  A Neural Signature of Hierarchical Reinforcement Learning , 2011, Neuron.

[34]  B. Balleine,et al.  Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[35]  Andrea Brovelli,et al.  Understanding the neural computations of arbitrary visuomotor learning through fMRI and associative learning theory. , 2008, Cerebral cortex.

[36]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[37]  D. Feldman The Spike-Timing Dependence of Plasticity , 2012, Neuron.

[38]  Risto Miikkulainen,et al.  Computational Maps in the Visual Cortex , 2005 .

[39]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[40]  Máté Lengyel,et al.  Goal-Directed Decision Making with Spiking Neurons , 2016, The Journal of Neuroscience.

[41]  Marc Toussaint,et al.  Learned graphical models for probabilistic planning provide a new class of movement primitives , 2013, Front. Comput. Neurosci..

[42]  P. Dayan,et al.  Goals and Habits in the Brain , 2013, Neuron.

[43]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[44]  K. Gurney,et al.  The nucleus accumbens as a nexus between values and goals in goal-directed behavior: a review and a new hypothesis , 2013, Front. Behav. Neurosci..

[45]  G. Rizzolatti,et al.  The Organization of the Frontal Motor Cortex. , 2000, News in physiological sciences : an international journal of physiology produced jointly by the International Union of Physiological Sciences and the American Physiological Society.

[46]  T. Ziemke,et al.  Theories and computational models of affordance and mirror systems: An integrative review , 2013, Neuroscience & Biobehavioral Reviews.

[47]  Vicenç Gómez,et al.  Optimal control as a graphical model inference problem , 2009, Machine Learning.

[48]  Wolfgang Maass,et al.  Bayesian Computation Emerges in Generic Cortical Microcircuits through Spike-Timing-Dependent Plasticity , 2013, PLoS Comput. Biol..

[49]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[50]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[51]  Y. Dan,et al.  Spike Timing-Dependent Plasticity of Neural Circuits , 2004, Neuron.

[52]  Wolfgang Maass,et al.  Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons , 2011, PLoS Comput. Biol..

[53]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[54]  Wolfgang Maass,et al.  On the Computational Power of Winner-Take-All , 2000, Neural Computation.

[55]  Andrea Brovelli,et al.  Differential roles of caudate nucleus and putamen during instrumental learning , 2011, NeuroImage.

[56]  Karol Gregor,et al.  Temporal Difference Variational Auto-Encoder , 2018, ICLR.

[57]  Charles Kemp,et al.  Bayesian models of cognition , 2008 .

[58]  Mehdi Khamassi,et al.  Modeling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning , 2015, Front. Behav. Neurosci..

[59]  M. Botvinick,et al.  Planning as inference , 2012, Trends in Cognitive Sciences.

[60]  Raymond Klein,et al.  Inhibition of return , 2000, Trends in Cognitive Sciences.

[61]  Anil K. Seth,et al.  Learning action-oriented models through active inference , 2019, bioRxiv.

[62]  Kenji Doya,et al.  Hierarchical control of goal-directed action in the cortical–basal ganglia network , 2015, Current Opinion in Behavioral Sciences.

[63]  Wulfram Gerstner,et al.  Predicting spike timing of neocortical pyramidal neurons by simple threshold models , 2006, Journal of Computational Neuroscience.

[64]  Alec Solway,et al.  Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. , 2012, Psychological review.

[65]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[66]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[67]  Daniel Chicharro,et al.  Characterization of Cortical Networks and Corticocortical Functional Connectivity Mediating Arbitrary Visuomotor Mapping , 2015, The Journal of Neuroscience.

[68]  Michael A. Arbib,et al.  The super-learning hypothesis: Integrating learning processes across cortex, cerebellum and basal ganglia , 2019, Neuroscience & Biobehavioral Reviews.

[69]  Andrea Brovelli,et al.  A spiking neural-network model of goal-directed behaviour , 2019, bioRxiv.

[70]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[71]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[72]  P. Moral Nonlinear filtering : Interacting particle resolution , 1997 .

[73]  Francesco Mannella,et al.  General differential Hebbian learning: Capturing temporal relations between events in neural networks and the brain , 2018, PLoS Comput. Biol..

[74]  Stefan Habenschuss,et al.  Distributed Bayesian Computation and Self-Organized Learning in Sheets of Spiking Neurons with Local Lateral Inhibition , 2015, PloS one.

[75]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[76]  R. Passingham,et al.  The Neurobiology of the Prefrontal Cortex: Anatomy, Evolution, and the Origin of Insight , 2012 .

[77]  Chris Eliasmith,et al.  A Spiking Neural Bayesian Model of Life Span Inference , 2017, CogSci.

[78]  Karl J. Friston The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[79]  Rajesh P. N. Rao,et al.  Bayesian brain : probabilistic approaches to neural coding , 2006 .

[80]  David Kappel,et al.  STDP Installs in Winner-Take-All Circuits an Online Approximation to Hidden Markov Model Learning , 2014, PLoS Comput. Biol..