Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing Its Gradient Estimator Bias

Equilibrium Propagation (EP) is a biologically-inspired algorithm for convergent RNNs with a local learning rule that comes with strong theoretical guarantees. The parameter updates of the neural network during the credit assignment phase have been shown mathematically to approach the gradients provided by Backpropagation Through Time (BPTT) when the network is infinitesimally nudged toward its target. In practice, however, training a network with the gradient estimates provided by EP does not scale to visual tasks harder than MNIST. In this work, we show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon and that cancelling it allows training deep ConvNets by EP. We show that this bias can be greatly reduced by using symmetric nudging (a positive nudging and a negative one). We also generalize previous EP equations to the case of cross-entropy loss (by opposition to squared error). As a result of these advances, we are able to achieve a test error of 11.7% on CIFAR-10 by EP, which approaches the one achieved by BPTT and provides a major improvement with respect to the standard EP approach with same-sign nudging that gives 86% test error. We also apply these techniques to train an architecture with asymmetric forward and backward connections, yielding a 13.2% test error. These results highlight EP as a compelling biologically-plausible approach to compute error gradients in deep neural networks.

[1]  Vladlen Koltun,et al.  Multiscale Deep Equilibrium Models , 2020, NeurIPS.

[2]  Colin J. Akerman,et al.  Random synaptic feedback weights support error backpropagation for deep learning , 2016, Nature Communications.

[3]  Mohamad Sawan,et al.  Analog Circuits to Accelerate the Relaxation Process in the Equilibrium Propagation Algorithm , 2020, 2020 IEEE International Symposium on Circuits and Systems (ISCAS).

[4]  Yoshua Bengio,et al.  How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation , 2014, ArXiv.

[5]  Tomaso A. Poggio,et al.  Biologically-plausible learning algorithms can scale to large datasets , 2018, ICLR.

[6]  Max Welling,et al.  Initialized Equilibrium Propagation for Backprop-Free Training , 2019, ICLR.

[7]  Yoshua Bengio,et al.  Generalization of Equilibrium Propagation to Vector Field Dynamics , 2018, ArXiv.

[8]  Yoshua Bengio,et al.  Equilibrium Propagation with Continual Weight Updates , 2019, ArXiv.

[9]  Jacques-Olivier Klein,et al.  In-Memory and Error-Immune Differential RRAM Implementation of Binarized Deep Neural Networks , 2018, 2018 IEEE International Electron Devices Meeting (IEDM).

[10]  Pritish Narayanan,et al.  Toward on-chip acceleration of the backpropagation algorithm using nonvolatile memory , 2017, IBM J. Res. Dev..

[11]  L. B. Almeida A learning rule for asynchronous perceptrons with feedback in a combinatorial environment , 1990 .

[12]  Yoshua Bengio,et al.  Training End-to-End Analog Neural Networks with Equilibrium Propagation , 2020, ArXiv.

[13]  Geoffrey E. Hinton,et al.  Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures , 2018, NeurIPS.

[14]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[15]  Yoshua Bengio,et al.  Equivalence of Equilibrium Propagation and Recurrent Backpropagation , 2017, Neural Computation.

[16]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[17]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[18]  Vladlen Koltun,et al.  Deep Equilibrium Models , 2019, NeurIPS.

[19]  Richard Naud,et al.  Burst-dependent synaptic plasticity can coordinate learning in hierarchical circuits , 2020, Nature Neuroscience.

[20]  J. F. Kolen,et al.  Backpropagation without weight transport , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[21]  Warren J. Gross,et al.  Towards Efficient On-Chip Learning using Equilibrium Propagation , 2020, 2020 IEEE International Symposium on Circuits and Systems (ISCAS).

[22]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[23]  Pritish Narayanan,et al.  Equivalent-accuracy accelerated neural-network training using analogue memory , 2018, Nature.

[24]  Peter C. Humphreys,et al.  Deep Learning without Weight Transport , 2019, NeurIPS.

[25]  Damien Querlioz,et al.  Digital Biologically Plausible Implementation of Binarized Neural Networks With Differential Hafnium Oxide Resistive Memory Arrays , 2019, Frontiers in Neuroscience.

[26]  Yann LeCun PhD thesis: Modeles connexionnistes de l'apprentissage (connectionist learning models) , 1987 .

[27]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[28]  Max Welling,et al.  Training a Spiking Neural Network with Equilibrium Propagation , 2019, AISTATS.

[29]  Adam Santoro,et al.  Backpropagation and the brain , 2020, Nature Reviews Neuroscience.

[30]  Yoshua Bengio,et al.  Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..

[31]  Fernando Corinto,et al.  Equilibrium Propagation for Memristor-Based Recurrent Neural Networks , 2020, Frontiers in Neuroscience.

[32]  Yoshua Bengio,et al.  Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input , 2019, NeurIPS.

[33]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[35]  Damien Querlioz,et al.  EqSpike: Spike-driven Equilibrium Propagation for Neuromorphic Implementations , 2020, ArXiv.

[36]  Surya Ganguli,et al.  A deep learning framework for neuroscience , 2019, Nature Neuroscience.

[37]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[38]  Damien Querlioz,et al.  Physics for neuromorphic computing , 2020, Nature Reviews Physics.

[39]  Damien Querlioz,et al.  Vowel recognition with four coupled spin-torque nano-oscillators , 2017, Nature.

[40]  Pritish Narayanan,et al.  Accelerating machine learning with Non-Volatile Memory: Exploring device and circuit tradeoffs , 2016, 2016 IEEE International Conference on Rebooting Computing (ICRC).