论文信息 - A generative spiking neural-network model of goal-directed behaviour and one-step planning

A generative spiking neural-network model of goal-directed behaviour and one-step planning

In mammals, goal-directed and planning processes support flexible behaviour used to face new situations that cannot be tackled through more efficient but rigid habitual behaviours. Within the Bayesian modelling approach of brain and behaviour, models have been proposed to perform planning as probabilistic inference but this approach encounters a crucial problem: explaining how such inference might be implemented in brain spiking networks. Recently, the literature has proposed some models that face this problem through recurrent spiking neural networks able to internally simulate state trajectories, the core function at the basis of planning. However, the proposed models have relevant limitations that make them biologically implausible, namely their world model is trained ‘off-line’ before solving the target tasks, and they are trained with supervised learning procedures that are biologically and ecologically not plausible. Here we propose two novel hypotheses on how brain might overcome these problems, and operationalise them in a novel architecture pivoting on a spiking recurrent neural network. The first hypothesis allows the architecture to learn the world model in parallel with its use for planning: to this purpose, a new arbitration mechanism decides when to explore, for learning the world model, or when to exploit it, for planning, based on the entropy of the world model itself. The second hypothesis allows the architecture to use an unsupervised learning process to learn the world model by observing the effects of actions. The architecture is validated by reproducing and accounting for the learning profiles and reaction times of human participants learning to solve a visuomotor learning task that is new for them. Overall, the architecture represents the first instance of a model bridging probabilistic planning and spiking-processes that has a degree of autonomy analogous to the one of real organisms.

[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2] A. Pouget,et al. Probabilistic brains: knowns and unknowns , 2013, Nature Neuroscience.

[3] J. Rothwell,et al. A fronto–striato–subthalamic–pallidal network for goal-directed and habitual inhibition , 2015, Nature Reviews Neuroscience.

[4] B. Balleine,et al. Motivational control of goal-directed action , 1994 .

[5] M. Botvinick,et al. Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[6] Jan Peters,et al. Recurrent Spiking Networks Solve Planning Tasks , 2016, Scientific Reports.

[7] B. Love,et al. The myth of computational level theory and the vacuity of rational analysis , 2011, Behavioral and Brain Sciences.

[8] Wei Ji Ma,et al. Efficient Probabilistic Inference in Generic Neural Networks Trained with Non-Probabilistic Feedback , 2018 .

[9] Geoffrey E. Hinton,et al. The Helmholtz Machine , 1995, Neural Computation.

[10] E. Rolls,et al. Computational analysis of the role of the hippocampus in memory , 1994, Hippocampus.

[11] Tiejun Huang,et al. Probabilistic Inference of Binary Markov Random Fields in Spiking Neural Networks through Mean-field Approximation , 2020, Neural Networks.

[12] G. Baldassarre,et al. Goal-Directed Behavior and Instrumental Devaluation: A Neural System-Level Computational Model , 2016, Front. Behav. Neurosci..

[13] Juan M. Corchado,et al. A Survey of Recent Advances in Particle Filters and Remaining Challenges for Multitarget Tracking , 2017, Sensors.

[14] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[15] P. Dayan,et al. Cortical substrates for exploratory decisions in humans , 2006, Nature.

[16] Gianluca Baldassarre,et al. Forward and Bidirectional Planning Based on Reinforcement Learning and Neural Networks in a Simulated Robot , 2003, ABiALS.

[17] Wolfgang Maass,et al. Networks of Spiking Neurons: The Third Generation of Neural Network Models , 1996, Electron. Colloquium Comput. Complex..

[18] Gianluca Baldassarre,et al. An Embodied Agent Learning Affordances With Intrinsic Motivations and Solving Extrinsic Tasks With Attention and One-Step Planning , 2019, Front. Neurorobot..

[19] Marc Toussaint,et al. Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[20] W. Gerstner,et al. Spike-Timing-Dependent Plasticity: A Comprehensive Overview , 2012, Front. Syn. Neurosci..

[21] Sophie Denève,et al. Bayesian Inference with Spiking Neurons , 2004, Encyclopedia of Computational Neuroscience.

[22] Alan MacLennan,et al. The artificial life route to artificial intelligence: Building embodied, situated agents , 1996 .

[23] Ben R. Newell,et al. Unpacking the Exploration–Exploitation Tradeoff: A Synthesis of Human and Animal Literatures , 2015 .

[24] B. Balleine,et al. The role of the dorsomedial striatum in instrumental conditioning , 2005, The European journal of neuroscience.

[25] Wofgang Maas,et al. Networks of spiking neurons: the third generation of neural network models , 1997 .

[26] Rajesh P. N. Rao,et al. Probabilistic Models of the Brain: Perception and Neural Function , 2002 .

[27] Jan Peters,et al. Deep spiking networks for model-based planning in humanoids , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[28] Gianluca Baldassarre,et al. Planning with neural networks and reinforcement learning , 2001 .

[29] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[30] Rajesh P. N. Rao,et al. Neurons as Monte Carlo Samplers: Bayesian Inference and Learning in Spiking Networks , 2014, NIPS.

[31] M. Newman. Complex Systems: A Survey , 2011, 1112.1440.

[32] Andrea Brovelli,et al. Dynamic reconfiguration of visuomotor-related functional connectivity networks. , 2016, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[33] Joseph T. McGuire,et al. A Neural Signature of Hierarchical Reinforcement Learning , 2011, Neuron.

[34] B. Balleine,et al. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[35] Andrea Brovelli,et al. Understanding the neural computations of arbitrary visuomotor learning through fMRI and associative learning theory. , 2008, Cerebral cortex.

[36] M. V. Velzen,et al. Self-organizing maps , 2007 .

[37] D. Feldman. The Spike-Timing Dependence of Plasticity , 2012, Neuron.

[38] Risto Miikkulainen,et al. Computational Maps in the Visual Cortex , 2005 .

[39] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[40] Máté Lengyel,et al. Goal-Directed Decision Making with Spiking Neurons , 2016, The Journal of Neuroscience.

[41] Marc Toussaint,et al. Learned graphical models for probabilistic planning provide a new class of movement primitives , 2013, Front. Comput. Neurosci..

[42] P. Dayan,et al. Goals and Habits in the Brain , 2013, Neuron.

[43] Nikolaus Kriegeskorte,et al. Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[44] K. Gurney,et al. The nucleus accumbens as a nexus between values and goals in goal-directed behavior: a review and a new hypothesis , 2013, Front. Behav. Neurosci..

[45] G. Rizzolatti,et al. The Organization of the Frontal Motor Cortex. , 2000, News in physiological sciences : an international journal of physiology produced jointly by the International Union of Physiological Sciences and the American Physiological Society.

[46] T. Ziemke,et al. Theories and computational models of affordance and mirror systems: An integrative review , 2013, Neuroscience & Biobehavioral Reviews.

[47] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.

[48] Wolfgang Maass,et al. Bayesian Computation Emerges in Generic Cortical Microcircuits through Spike-Timing-Dependent Plasticity , 2013, PLoS Comput. Biol..

[49] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[50] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[51] Y. Dan,et al. Spike Timing-Dependent Plasticity of Neural Circuits , 2004, Neuron.

[52] Wolfgang Maass,et al. Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons , 2011, PLoS Comput. Biol..

[53] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[54] Wolfgang Maass,et al. On the Computational Power of Winner-Take-All , 2000, Neural Computation.

[55] Andrea Brovelli,et al. Differential roles of caudate nucleus and putamen during instrumental learning , 2011, NeuroImage.

[56] Karol Gregor,et al. Temporal Difference Variational Auto-Encoder , 2018, ICLR.

[57] Charles Kemp,et al. Bayesian models of cognition , 2008 .