论文信息 - Spiking neurons can discover predictive features by aggregate-label learning

Spiking neurons can discover predictive features by aggregate-label learning

Credit assignment in the brain To discover relevant clues for survival, an organism must bridge the gap between the short time periods when a clue occurs and the potentially long waiting times after which feedback arrives. This so-called temporal credit-assignment problem is also a major challenge in machine learning. Gütig developed a representation of the responses of spiking neurons, whose derivative defines the direction along which a neuron's response changes most rapidly. By using a learning rule that follows this development, the temporal credit-assignment problem can be solved by training a neuron to match its number of output spikes to the number of clues. The same learning rule endows unsupervised neural networks with powerful learning capabilities. Science, this issue p. 10.1126/science.aab4113 Neurons learn to associate related sensory stimuli even when they are dispersed across time and in space. INTRODUCTION Opportunities and dangers can often be predicted on the basis of sensory clues. The attack of a predator, for example, may be preceded by the sounds of breaking twigs or whiffs of odor. Life is easier if one learns these clues. However, this is difficult when clues are hidden within distracting streams of unrelated sensory activity. Even worse, they can be separated from the events that they predict by long and variable delays. To discover those clues, a learning procedure must bridge the gap between the short epochs within which clues occur and the time when feedback arrives. This “temporal credit-assignment problem” is a core challenge in biological and machine learning. RATIONALE A neural detector of a sensory clue should fire whenever the clue occurs but remain silent otherwise. Hence, the number of output spikes of this neuron should be proportional to the number of times that the clue occurred. The reversal of this observation is the core hypothesis of this study: A neuron can identify an unknown clue when it is trained to fire in proportion to the clue’s number of occurrences. This “aggregate-label” hypothesis entails that when a neuron is trained to match its number of output spikes to the magnitude of a feedback signal, it will identify a set of clues within its input activity whose occurrences predict the feedback. This learning requires neither knowledge of the time nor of the absolute number of individual clues. RESULTS To implement aggregate-label learning, I calculated how neurons should modify their synaptic efficacies in order to most effectively adjust their number of output spikes. Because a neuron’s discrete number of spikes does not provide a direction of gradual improvement, I derived the multi-spike tempotron learning rule in an abstract space of continuous spike threshold variables. In this space, changes in synaptic efficacies are directed along the steepest path, reducing the discrepancy between a neuron’s fixed biological spike threshold and the closest hypothetical threshold at which the neuron would fire a desired number of spikes. With the resulting synaptic learning rule, aggregate-label learning enabled simple neuron models to solve the temporal credit assignment problem. Neurons reliably identified all clues whose occurrences contributed to a delayed feedback signal. For instance, a neuron could learn to respond with different numbers of spikes to individual clues without being told how many different clues existed, when they occurred, or how much each one of them contributed to the feedback. This learning was robust to high levels of feedback and input noise and performed well on a connected speech-recognition task. Aggregate-label learning enabled populations of neurons to solve unsupervised learning tasks by relying on internally generated feedback signals that amplified correlations between the neurons’ output spike counts. These self-supervised networks discovered reoccurring constellations of input patterns even if they were rare and distributed over spatial and temporal scales that exceeded the receptive fields of individual neurons. Because learning in self-supervised networks is driven by aggregate numbers of feature occurrences, it does not require temporal alignment of the input activities of individual neurons. When competitive interactions between individual neurons were mediated through the internal feedback circuit, the formation of feature maps was possible even when the features’ asynchrony incapacitated lateral inhibition. CONCLUSION Aggregate-label learning solves the long-standing question of how neural systems can identify features within their input activity that predict a delayed feedback. This solution strongly enhances the known learning capabilities of simple neural circuit models. Because the feedback can be external or internal, these enhancements apply to supervised and unsupervised learning. In this framework, both forms of learning converge onto the same rule of synaptic plasticity, inviting future research on how they cooperate when brains learn. Membrane potential traces of a model neuron before and after learning to detect reward predictive sensory clues. Before learning, top trace; after learning, second through fifth traces from top; clues, colored squares. Each clue occurrence is represented as a spike pattern within the neuron’s input activity (raster plot). After learning, the number of output spikes (vertical deflections) elicited by each clue encodes the clue’s contribution to a delayed reward. The brain routinely discovers sensory clues that predict opportunities or dangers. However, it is unclear how neural learning processes can bridge the typically long delays between sensory clues and behavioral outcomes. Here, I introduce a learning concept, aggregate-label learning, that enables biologically plausible model neurons to solve this temporal credit assignment problem. Aggregate-label learning matches a neuron’s number of output spikes to a feedback signal that is proportional to the number of clues but carries no information about their timing. Aggregate-label learning outperforms stochastic reinforcement learning at identifying predictive clues and is able to solve unsegmented speech-recognition tasks. Furthermore, it allows unsupervised neural networks to discover reoccurring constellations of sensory features even when they are widely dispersed across space and time.

Robert Gütig | R. Gütig

[1] Wolfgang Maass,et al. Bayesian Computation Emerges in Generic Cortical Microcircuits through Spike-Timing-Dependent Plasticity , 2013, PLoS Comput. Biol..

[2] Geoffrey E. Hinton,et al. Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[3] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[4] Andrzej J. Kasinski,et al. Supervised Learning in Spiking Neural Networks with ReSuMe: Sequence Learning, Classification, and Spike Shifting , 2010, Neural Computation.

[5] H. Seung,et al. Model of birdsong learning based on gradient estimation by dynamic perturbation of neural conductances. , 2007, Journal of neurophysiology.

[6] Haim Sompolinsky,et al. Learning Input Correlations through Nonlinear Temporally Asymmetric Hebbian Plasticity , 2003, The Journal of Neuroscience.

[7] Jaume Amores,et al. Multiple instance classification: Review, taxonomy and comparative study , 2013, Artif. Intell..

[8] Zengcai V. Guo,et al. Neural coding during active somatosensation revealed using illusory touch , 2013, Nature Neuroscience.

[9] S. Thorpe,et al. Spike Timing Dependent Plasticity Finds the Start of Repeating Patterns in Continuous Spike Trains , 2008, PloS one.

[10] J. Movshon,et al. The statistical reliability of signals in single neurons in cat and monkey visual cortex , 1983, Vision Research.

[11] Jean-Pascal Pfister,et al. Optimal Spike-Timing-Dependent Plasticity for Precise Action Potential Firing in Supervised Learning , 2005, Neural Computation.