Training End-to-End Analog Neural Networks with Equilibrium Propagation

We introduce a principled method to train end-to-end analog neural networks by stochastic gradient descent. In these analog neural networks, the weights to be adjusted are implemented by the conductances of programmable resistive devices such as memristors [Chua, 1971], and the nonlinear transfer functions (or `activation functions') are implemented by nonlinear components such as diodes. We show mathematically that a class of analog neural networks (called nonlinear resistive networks) are energy-based models: they possess an energy function as a consequence of Kirchhoff's laws governing electrical circuits. This property enables us to train them using the Equilibrium Propagation framework [Scellier and Bengio, 2017]. Our update rule for each conductance, which is local and relies solely on the voltage drop across the corresponding resistor, is shown to compute the gradient of the loss function. Our numerical simulations, which use the SPICE-based Spectre simulation framework to simulate the dynamics of electrical circuits, demonstrate training on the MNIST classification task, performing comparably or better than equivalent-size software-based neural networks. Our work can guide the development of a new generation of ultra-fast, compact and low-power neural networks supporting on-chip learning.

[1]  Yu Wang,et al.  MErging the Interface: Power, area and accuracy co-optimization for RRAM crossbar-based mixed-signal computing system , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[2]  R. Williams,et al.  Repeatable, accurate, and high speed multi-level programming of memristor 1T1R arrays for power efficient analog computing applications , 2016, Nanotechnology.

[3]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[4]  Chih-Cheng Chang,et al.  Mitigating Asymmetric Nonlinear Weight Update Effects in Hardware Neural Network Based on Analog Resistive Synapse , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[7]  Sangsu Park,et al.  Conductive-bridging random-access memories for emerging neuromorphic computing. , 2020, Nanoscale.

[8]  Qing Wu,et al.  Efficient and self-adaptive in-situ learning in multilayer memristor neural networks , 2018, Nature Communications.

[9]  F. Merrikh Bayat,et al.  Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[10]  Yoshua Bengio,et al.  Equilibrium Propagation with Continual Weight Updates , 2019, ArXiv.

[11]  James C. R. Whittington,et al.  Theories of Error Back-Propagation in the Brain , 2019, Trends in Cognitive Sciences.

[12]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[13]  Advait Madhavan,et al.  Streaming Batch Eigenupdates for Hardware Neural Networks , 2019, Front. Neurosci..

[14]  Peng Lin,et al.  Reinforcement learning with analogue memristor arrays , 2019, Nature Electronics.

[15]  Paolo Fantini,et al.  Unsupervised Learning by Spike Timing Dependent Plasticity in Phase Change Memory (PCM) Synapses , 2016, Front. Neurosci..

[16]  Yoshua Bengio,et al.  Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input , 2019, NeurIPS.

[17]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[18]  Armantas Melianas,et al.  Parallel programming of an ionic floating-gate memory array for scalable neuromorphic computing , 2019, Science.

[19]  Jacques-Olivier Klein,et al.  Spin-Transfer Torque Magnetic Memory as a Stochastic Memristive Synapse for Neuromorphic Systems , 2015, IEEE Transactions on Biomedical Circuits and Systems.

[20]  Catherine Graves,et al.  Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[21]  Ahmed Faraz Khan Bidirectional Learning in Recurrent Neural Networks Using Equilibrium Propagation , 2018 .

[22]  Yoshua Bengio,et al.  Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..

[23]  Yoshua Bengio,et al.  Generalization of Equilibrium Propagation to Vector Field Dynamics , 2018, ArXiv.

[24]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Pritish Narayanan,et al.  Neuromorphic computing using non-volatile memory , 2017 .

[26]  J. Yang,et al.  Memristive crossbar arrays for brain-inspired computing , 2019, Nature Materials.

[27]  William J. Dally,et al.  Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[28]  Pritish Narayanan,et al.  Equivalent-accuracy accelerated neural-network training using analogue memory , 2018, Nature.

[29]  Shimeng Yu,et al.  Ferroelectric FET analog synapse for acceleration of deep neural network training , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[30]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[31]  Max Welling,et al.  Initialized Equilibrium Propagation for Backprop-Free Training , 2019, ICLR.

[32]  Fernando Corinto,et al.  Equilibrium Propagation for Memristor-Based Recurrent Neural Networks , 2020, Frontiers in Neuroscience.

[33]  Stephen Grossberg,et al.  Absolute stability of global pattern formation and parallel memory storage by competitive neural networks , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[34]  Javier R. Movellan,et al.  Contrastive Hebbian Learning in the Continuous Hopfield Model , 1991 .

[35]  Wei Yang Lu,et al.  Nanoscale memristor device as synapse in neuromorphic systems. , 2010, Nano letters.

[36]  L. Chua Memristor-The missing circuit element , 1971 .

[37]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Carver Mead,et al.  Analog VLSI and neural systems , 1989 .

[39]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[40]  Brendan Fong,et al.  A Compositional Framework for Passive Linear Networks , 2015, 1504.05625.

[41]  Surya Ganguli,et al.  A deep learning framework for neuroscience , 2019, Nature Neuroscience.

[42]  Yoshua Bengio,et al.  Equivalence of Equilibrium Propagation and Recurrent Backpropagation , 2017, Neural Computation.

[43]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[44]  Juan C. Nino,et al.  Deep Learning in Memristive Nanowire Networks , 2020, ArXiv.

[45]  Sujan Kumar Gonugondla,et al.  An MRAM-Based Deep In-Memory Architecture for Deep Neural Networks , 2019, 2019 IEEE International Symposium on Circuits and Systems (ISCAS).

[46]  Adam Santoro,et al.  Backpropagation and the brain , 2020, Nature Reviews Neuroscience.

[47]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[48]  Yachen Xiang,et al.  Hardware implementation of RRAM based binarized neural networks , 2019, APL Materials.

[49]  Max Welling,et al.  Training a Spiking Neural Network with Equilibrium Propagation , 2019, AISTATS.

[50]  Kari Christianson,et al.  Nonlinear Electrical Networks , 2010 .