Spiking neurons can discover predictive features by aggregate-label learning

Credit assignment in the brain To discover relevant clues for survival, an organism must bridge the gap between the short time periods when a clue occurs and the potentially long waiting times after which feedback arrives. This so-called temporal credit-assignment problem is also a major challenge in machine learning. Gütig developed a representation of the responses of spiking neurons, whose derivative defines the direction along which a neuron's response changes most rapidly. By using a learning rule that follows this development, the temporal credit-assignment problem can be solved by training a neuron to match its number of output spikes to the number of clues. The same learning rule endows unsupervised neural networks with powerful learning capabilities. Science, this issue p. 10.1126/science.aab4113 Neurons learn to associate related sensory stimuli even when they are dispersed across time and in space. INTRODUCTION Opportunities and dangers can often be predicted on the basis of sensory clues. The attack of a predator, for example, may be preceded by the sounds of breaking twigs or whiffs of odor. Life is easier if one learns these clues. However, this is difficult when clues are hidden within distracting streams of unrelated sensory activity. Even worse, they can be separated from the events that they predict by long and variable delays. To discover those clues, a learning procedure must bridge the gap between the short epochs within which clues occur and the time when feedback arrives. This “temporal credit-assignment problem” is a core challenge in biological and machine learning. RATIONALE A neural detector of a sensory clue should fire whenever the clue occurs but remain silent otherwise. Hence, the number of output spikes of this neuron should be proportional to the number of times that the clue occurred. The reversal of this observation is the core hypothesis of this study: A neuron can identify an unknown clue when it is trained to fire in proportion to the clue’s number of occurrences. This “aggregate-label” hypothesis entails that when a neuron is trained to match its number of output spikes to the magnitude of a feedback signal, it will identify a set of clues within its input activity whose occurrences predict the feedback. This learning requires neither knowledge of the time nor of the absolute number of individual clues. RESULTS To implement aggregate-label learning, I calculated how neurons should modify their synaptic efficacies in order to most effectively adjust their number of output spikes. Because a neuron’s discrete number of spikes does not provide a direction of gradual improvement, I derived the multi-spike tempotron learning rule in an abstract space of continuous spike threshold variables. In this space, changes in synaptic efficacies are directed along the steepest path, reducing the discrepancy between a neuron’s fixed biological spike threshold and the closest hypothetical threshold at which the neuron would fire a desired number of spikes. With the resulting synaptic learning rule, aggregate-label learning enabled simple neuron models to solve the temporal credit assignment problem. Neurons reliably identified all clues whose occurrences contributed to a delayed feedback signal. For instance, a neuron could learn to respond with different numbers of spikes to individual clues without being told how many different clues existed, when they occurred, or how much each one of them contributed to the feedback. This learning was robust to high levels of feedback and input noise and performed well on a connected speech-recognition task. Aggregate-label learning enabled populations of neurons to solve unsupervised learning tasks by relying on internally generated feedback signals that amplified correlations between the neurons’ output spike counts. These self-supervised networks discovered reoccurring constellations of input patterns even if they were rare and distributed over spatial and temporal scales that exceeded the receptive fields of individual neurons. Because learning in self-supervised networks is driven by aggregate numbers of feature occurrences, it does not require temporal alignment of the input activities of individual neurons. When competitive interactions between individual neurons were mediated through the internal feedback circuit, the formation of feature maps was possible even when the features’ asynchrony incapacitated lateral inhibition. CONCLUSION Aggregate-label learning solves the long-standing question of how neural systems can identify features within their input activity that predict a delayed feedback. This solution strongly enhances the known learning capabilities of simple neural circuit models. Because the feedback can be external or internal, these enhancements apply to supervised and unsupervised learning. In this framework, both forms of learning converge onto the same rule of synaptic plasticity, inviting future research on how they cooperate when brains learn. Membrane potential traces of a model neuron before and after learning to detect reward predictive sensory clues. Before learning, top trace; after learning, second through fifth traces from top; clues, colored squares. Each clue occurrence is represented as a spike pattern within the neuron’s input activity (raster plot). After learning, the number of output spikes (vertical deflections) elicited by each clue encodes the clue’s contribution to a delayed reward. The brain routinely discovers sensory clues that predict opportunities or dangers. However, it is unclear how neural learning processes can bridge the typically long delays between sensory clues and behavioral outcomes. Here, I introduce a learning concept, aggregate-label learning, that enables biologically plausible model neurons to solve this temporal credit assignment problem. Aggregate-label learning matches a neuron’s number of output spikes to a feedback signal that is proportional to the number of clues but carries no information about their timing. Aggregate-label learning outperforms stochastic reinforcement learning at identifying predictive clues and is able to solve unsegmented speech-recognition tasks. Furthermore, it allows unsupervised neural networks to discover reoccurring constellations of sensory features even when they are widely dispersed across space and time.

[1]  Wolfgang Maass,et al.  Bayesian Computation Emerges in Generic Cortical Microcircuits through Spike-Timing-Dependent Plasticity , 2013, PLoS Comput. Biol..

[2]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[3]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[4]  Andrzej J. Kasinski,et al.  Supervised Learning in Spiking Neural Networks with ReSuMe: Sequence Learning, Classification, and Spike Shifting , 2010, Neural Computation.

[5]  H. Seung,et al.  Model of birdsong learning based on gradient estimation by dynamic perturbation of neural conductances. , 2007, Journal of neurophysiology.

[6]  Haim Sompolinsky,et al.  Learning Input Correlations through Nonlinear Temporally Asymmetric Hebbian Plasticity , 2003, The Journal of Neuroscience.

[7]  Jaume Amores,et al.  Multiple instance classification: Review, taxonomy and comparative study , 2013, Artif. Intell..

[8]  Zengcai V. Guo,et al.  Neural coding during active somatosensation revealed using illusory touch , 2013, Nature Neuroscience.

[9]  S. Thorpe,et al.  Spike Timing Dependent Plasticity Finds the Start of Repeating Patterns in Continuous Spike Trains , 2008, PloS one.

[10]  J. Movshon,et al.  The statistical reliability of signals in single neurons in cat and monkey visual cortex , 1983, Vision Research.

[11]  Jean-Pascal Pfister,et al.  Optimal Spike-Timing-Dependent Plasticity for Precise Action Potential Firing in Supervised Learning , 2005, Neural Computation.

[12]  Xiaoqin Zeng,et al.  A New Supervised Learning Algorithm for Spiking Neurons , 2013, Neural Computation.

[13]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[14]  R Linsker,et al.  Perceptual neural organization: some approaches based on network models and information theory. , 1990, Annual review of neuroscience.

[15]  E. Izhikevich Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.

[16]  Walter Senn,et al.  Spatio-Temporal Credit Assignment in Neuronal Population Learning , 2011, PLoS Comput. Biol..

[17]  M. Abeles Role of the cortical neuron: integrator or coincidence detector? , 1982, Israel journal of medical sciences.

[18]  W. Newsome,et al.  The Variable Discharge of Cortical Neurons: Implications for Connectivity, Computation, and Information Coding , 1998, The Journal of Neuroscience.

[19]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[20]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[21]  David Barber,et al.  Learning in Spiking Neural Assemblies , 2002, NIPS.

[22]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[23]  Robert Gütig,et al.  To spike, or when to spike? , 2014, Current Opinion in Neurobiology.

[24]  G. Laurent,et al.  Odour encoding by temporal sequences of firing in oscillating neural assemblies , 1996, Nature.

[25]  Michael N. Shadlen,et al.  Probabilistic reasoning by neurons , 2007, Nature.

[26]  David S. Greenberg,et al.  Changing the responses of cortical neurons from sub- to suprathreshold using single spikes in vivo , 2013, eLife.

[27]  David Fitzpatrick,et al.  Initial Neighborhood Biases and the Quality of Motion Stimulation Jointly Influence the Rapid Emergence of Direction Preference in Visual Cortex , 2012, The Journal of Neuroscience.

[28]  W. Singer,et al.  Different voltage-dependent thresholds for inducing long-term depression and long-term potentiation in slices of rat visual cortex , 1990, Nature.

[29]  Tim Gollisch,et al.  Rapid Neural Coding in the Retina with Relative Spike Latencies , 2008, Science.

[30]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[31]  H. Sompolinsky,et al.  Time-Warp–Invariant Neuronal Processing , 2009, PLoS biology.

[32]  M. Larkum A cellular mechanism for cortical associations: an organizing principle for the cerebral cortex , 2013, Trends in Neurosciences.

[33]  Rogier Min,et al.  The computational power of astrocyte mediated synaptic plasticity , 2012, Front. Comput. Neurosci..

[34]  D. Feldman The Spike-Timing Dependence of Plasticity , 2012, Neuron.

[35]  R. Nicoll,et al.  Ca2+ Signaling Requirements for Long-Term Depression in the Hippocampus , 1996, Neuron.

[36]  J. Wickens,et al.  Timing is not Everything: Neuromodulation Opens the STDP Gate , 2010, Front. Syn. Neurosci..

[37]  R. Nicoll,et al.  Long-term potentiation--a decade of progress? , 1999, Science.

[38]  S. Gerber,et al.  Unsupervised Natural Experience Rapidly Alters Invariant Object Representation in Visual Cortex , 2008 .

[39]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[40]  D. W. Wheeler,et al.  Brightness Induction: Rate Enhancement and Neuronal Synchronization as Complementary Codes , 2006, Neuron.

[41]  Wolf Singer,et al.  Neuronal Synchrony: A Versatile Code for the Definition of Relations? , 1999, Neuron.

[42]  Haim Sompolinsky,et al.  Learning Precisely Timed Spikes , 2014, Neuron.

[43]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[44]  Razvan V. Florian,et al.  The Chronotron: A Neuron That Learns to Fire Temporally Precise Spike Patterns , 2010, PloS one.

[45]  Stephen D. Van Hooser,et al.  Experience with moving visual stimuli drives the early development of cortical direction selectivity , 2008, Nature.

[46]  Suzanna Becker,et al.  Mutual information maximization: models of cortical self-organization. , 1996, Network.

[47]  L. Abbott,et al.  Synaptic plasticity: taming the beast , 2000, Nature Neuroscience.

[48]  H. Sompolinsky,et al.  The tempotron: a neuron that learns spike timing–based decisions , 2006, Nature Neuroscience.

[49]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[50]  H. Sompolinsky,et al.  Computing Complex Visual Features with Retinal Spike Times , 2013, PloS one.

[51]  Bartlett W. Mel,et al.  Pyramidal Neuron as Two-Layer Neural Network , 2003, Neuron.

[52]  W. Schultz Updating dopamine reward signals , 2013, Current Opinion in Neurobiology.