Deep Learning in Memristive Nanowire Networks

Analog crossbar architectures for accelerating neural network training and inference have made tremendous progress over the past several years. These architectures are ideal for dense layers with fewer than roughly a thousand neurons. However, for large sparse layers, crossbar architectures are highly inefficient. A new hardware architecture, dubbed the MN3 (Memristive Nanowire Neural Network), was recently described as an efficient architecture for simulating very wide, sparse neural network layers, on the order of millions of neurons per layer. The MN3 utilizes a high-density memristive nanowire mesh to efficiently connect large numbers of silicon neurons with modifiable weights. Here, in order to explore the MN3's ability to function as a deep neural network, we describe one algorithm for training deep MN3 models and benchmark simulations of the architecture on two deep learning tasks. We utilize a simple piecewise linear memristor model, since we seek to demonstrate that training is, in principle, possible for randomized nanowire architectures. In future work, we intend on utilizing more realistic memristor models, and we will adapt the presented algorithm appropriately. We show that the MN3 is capable of performing composition, gradient propagation, and weight updates, which together allow it to function as a deep neural network. We show that a simulated multilayer perceptron (MLP), built from MN3 networks, can obtain a 1.61% error rate on the popular MNIST dataset, comparable to equivalently sized software-based network. This work represents, to the authors' knowledge, the first randomized nanowire architecture capable of reproducing the backpropagation algorithm.

[1]  Misha Denil,et al.  Noisy Activation Functions , 2016, ICML.

[2]  Steve B. Furber,et al.  Understanding the interconnection network of SpiNNaker , 2009, ICS.

[3]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[4]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[5]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[6]  Vivienne Sze,et al.  Hardware for machine learning: Challenges and opportunities , 2017, 2017 IEEE Custom Integrated Circuits Conference (CICC).

[7]  Indranil Chakraborty,et al.  Technology Aware Training in Memristive Neuromorphic Systems for Nonideal Synaptic Crossbars , 2017, IEEE Transactions on Emerging Topics in Computational Intelligence.

[8]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[9]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[10]  Y. V. Pershin,et al.  SPICE Model of Memristive Devices with Threshold , 2012, 1204.2600.

[11]  Catherine Graves,et al.  Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Yoshua Bengio,et al.  Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..

[13]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[14]  Catherine E. Graves,et al.  Memristor‐Based Analog Computation and Neural Network Classification with a Dot Product Engine , 2018, Advanced materials.

[15]  M. A. Muñoz,et al.  Neutral Theory and Scale-Free Neural Dynamics , 2017, 1703.05079.

[16]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[17]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[18]  Quoc V. Le,et al.  Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.

[19]  Kaushik Roy,et al.  Technology Aware Training in Memristive Neuromorphic Systems based on non-ideal Synaptic Crossbars , 2017, ArXiv.

[20]  Yoshua Bengio,et al.  Training deep neural networks with low precision multiplications , 2014 .

[21]  F. Y. Wu Theory of resistor networks: the two-point resistance , 2004 .

[22]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[23]  Rodrigo Alvarez-Icaza,et al.  A Multicast Tree Router for Multichip Neuromorphic Systems , 2014, IEEE Transactions on Circuits and Systems I: Regular Papers.

[24]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Kyungmin Kim,et al.  Memristor Applications for Programmable Analog ICs , 2011, IEEE Transactions on Nanotechnology.

[27]  Gabriel Kron,et al.  Tensor analysis of networks , 1967 .

[28]  Eby G. Friedman,et al.  Modeling Size Limitations of Resistive Crossbar Array With Cell Selectors , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[29]  Juan C. Nino,et al.  Memristive nanowires exhibit small-world connectivity , 2018, Neural Networks.

[30]  Fang Liu,et al.  Whiteout: Gaussian Adaptive Noise Regularization in FeedForward Neural Networks , 2016 .

[31]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[32]  Jacek M. Zurada,et al.  Learning Understandable Neural Networks With Nonnegative Weight Constraints , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[34]  M. Prezioso,et al.  A multiply-add engine with monolithically integrated 3D memristor crossbar/CMOS hybrid circuit , 2017, Scientific reports.

[35]  Hong Wang,et al.  Loihi: A Neuromorphic Manycore Processor with On-Chip Learning , 2018, IEEE Micro.

[36]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[37]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[38]  Hao Jiang,et al.  A Memristor Crossbar Based Computing Engine Optimized for High Speed and Accuracy , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[39]  Juan C. Nino,et al.  Evaluation of the computational capabilities of a memristive random network (MN3) under the context of reservoir computing , 2018, Neural Networks.

[40]  A. Holden Stochastic processes in neurophysiology: transformation from point to continuous processes. , 1983, Bulletin of mathematical biology.

[41]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[42]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[43]  Karsten Beckmann,et al.  A practical hafnium-oxide memristor model suitable for circuit design and simulation , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[44]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[45]  Diederik P. Kingma,et al.  GPU Kernels for Block-Sparse Weights , 2017 .

[46]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[47]  Jouko Lampinen,et al.  Bayesian approach for neural networks--review and case studies , 2001, Neural Networks.

[48]  Zhizhang Shen,et al.  The calculation of average distance in mesh structures , 2000, SAC '00.

[49]  Chao Du,et al.  Device nonideality effects on image reconstruction using memristor arrays , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[50]  Bernabé Linares-Barranco,et al.  Multicasting Mesh AER: A Scalable Assembly Approach for Reconfigurable Neuromorphic Structured AER Systems. Application to ConvNets , 2013, IEEE Transactions on Biomedical Circuits and Systems.

[51]  Jun Du,et al.  Hierarchical deep neural network for multivariate regression , 2017, Pattern Recognit..

[52]  Edward T. Bullmore,et al.  Efficiency and Cost of Economical Brain Functional Networks , 2007, PLoS Comput. Biol..

[53]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..