Towards a Biologically Plausible Backprop

This work contributes several new elements to the quest for a biologically plausible implementation of backprop in brains. We introduce a very general and abstract framework for machine learning, in which the quantities of interest are defined implicitly through an energy function. In this framework, only one kind of neural computation is involved both for the first phase (when the prediction is made) and the second phase (after the target is revealed), like the contrastive Hebbian learning algorithm in the continuous Hopfield model for example. Contrary to automatic differentiation in computational graphs (i.e. standard backprop), there is no need for special computation in the second phase of our framework. One advantage of our framework over contrastive Hebbian learning is that the second phase corresponds to only nudging the first-phase fixed point towards a configuration that reduces prediction error. In the case of a multi-layer supervised neural network, the output units are slightly nudged towards their target, and the perturbation introduced at the output layer propagates backward in the network. The signal 'back-propagated' during this second phase actually contains information about the error derivatives, which we use to implement a learning rule proved to perform gradient descent with respect to an objective cost function.

[1]  Nathan Intrator,et al.  Objective function formulation of the BCM theory of visual cortical plasticity: Statistical connections, stability conditions , 1992, Neural Networks.

[2]  Wulfram Gerstner,et al.  A neuronal learning rule for sub-millisecond temporal coding , 1996, Nature.

[3]  Karl J. Friston,et al.  Free-energy and the brain , 2007, Synthese.

[4]  Yoshua Bengio,et al.  Towards Biologically Plausible Deep Learning , 2015, ArXiv.

[5]  Guillaume Charpiat,et al.  Training recurrent networks online without backtracking , 2015, ArXiv.

[6]  Ilya Sutskever,et al.  On the Convergence Properties of Contrastive Divergence , 2010, AISTATS.

[7]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[8]  Xiaohui Xie,et al.  Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network , 2003, Neural Computation.

[9]  Luís B. Almeida,et al.  A learning rule for asynchronous perceptrons with feedback in a combinatorial environment , 1990 .

[10]  Yoshua Bengio,et al.  Early Inference in Energy-Based Models Approximates Back-Propagation , 2015, ArXiv.

[11]  Sanjeev Arora,et al.  Why are deep nets reversible: A simple theory, with implications for training , 2015, ArXiv.

[12]  Yoshua Bengio,et al.  STDP as presynaptic activity times rate of change of postsynaptic activity , 2015, 1509.05936.

[13]  Yoshua Bengio,et al.  Feedforward Initialization for Fast Inference of Deep Generative Networks is biologically plausible , 2016, ArXiv.

[14]  Yann Ollivier,et al.  Unbiased Online Recurrent Optimization , 2017, ICLR.

[15]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[16]  G. Bi,et al.  Synaptic modification by correlated activity: Hebb's postulate revisited. , 2001, Annual review of neuroscience.

[17]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[18]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[19]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[20]  J. Pfister,et al.  A triplet spike-timing–dependent plasticity model generalizes the Bienenstock–Cooper–Munro rule to higher-order spatiotemporal correlations , 2011, Proceedings of the National Academy of Sciences.

[21]  E. Bienenstock,et al.  Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[22]  Daniel Cownden,et al.  Random feedback weights support learning in deep neural networks , 2014, ArXiv.

[23]  Randall C. O'Reilly,et al.  Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm , 1996, Neural Computation.

[24]  József Fiser,et al.  Spontaneous Cortical Activity Reveals Hallmarks of an Optimal Internal Model of the Environment , 2011, Science.

[25]  Javier R. Movellan,et al.  Contrastive Hebbian Learning in the Continuous Hopfield Model , 1991 .

[26]  Xiaohui Xie,et al.  Spike-based Learning Rules and Stabilization of Persistent Neural Activity , 1999, NIPS.

[27]  Torsten Lehmann,et al.  Nonlinear backpropagation: doing backpropagation without derivatives of the activation function , 1997, IEEE Trans. Neural Networks.

[28]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[29]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[30]  D. Feldman The Spike-Timing Dependence of Plasticity , 2012, Neuron.

[31]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[32]  Wulfram Gerstner,et al.  Towards deep learning with spiking neurons in energy based models with contrastive Hebbian plasticity , 2016, ArXiv.

[33]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.