论文信息 - TIME: A training-in-memory architecture for memristor-based deep neural networks

TIME: A training-in-memory architecture for memristor-based deep neural networks

The training of neural network (NN) is usually time-consuming and resource intensive. Memristor has shown its potential in computation of NN. Especially for the metal-oxide resistive random access memory (RRAM), its crossbar structure and multi-bit characteristic can perform the matrix-vector product in high precision, which is the most common operation of NN. However, there exist two challenges on realizing the training of NN. Firstly, the current architecture can only support the inference phase of training and cannot perform the backpropagation (BP), the weights update of NN. Secondly, the training of NN requires enormous iterations and constantly updates the weights to reach the convergence, which leads to large energy consumption because of lots of write and read operations. In this work, we propose a novel architecture, TIME, and peripheral circuit designs to enable the training of NN in RRAM. TIME supports the BP and the weights update while maximizing the reuse of peripheral circuits for the inference operation on RRAM. Meanwhile, a variability-free tuning scheme and gradually-write circuits are designed to reduce the cost of tuning RRAM. We explore the performance of both SL (supervised learning) and DRL (deep reinforcement learning) in TIME, and a specific mapping method of DRL is also introduced to further improve the energy efficiency. Experimental results show that, in SL, TIME can achieve 5.3× higher energy efficiency on average compared with the most powerful application-specific integrated circuits (ASIC) in the literature. In DRL, TIME can perform averagely 126× higher than GPU in energy efficiency. If the cost of tuning RRAM can be further reduced, TIME have the potential of boosting the energy efficiency by 2 orders of magnitude compared with ASIC.

[1] Shimeng Yu,et al. MNSIM: Simulation platform for memristor-based neuromorphic computing system , 2016, DATE 2016.

[2] Tao Zhang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[3] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4] Yu Wang,et al. MNSIM: Simulation Platform for Memristor-Based Neuromorphic Computing System , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[5] Avinoam Kolodny,et al. Memristor-Based Multilayer Neural Networks With Online Gradient Descent Training , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[6] Shimeng Yu,et al. A neuromorphic visual system using RRAM synaptic devices with Sub-pJ energy and tolerance to variability: Experimental characterization and large-scale modeling , 2012, 2012 International Electron Devices Meeting.

[7] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[8] Shimeng Yu,et al. Recent progress of phase change memory (PCM) and resistive switching random access memory (RRAM) , 2010, 2010 10th IEEE International Conference on Solid-State and Integrated Circuit Technology.

[9] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[10] Cong Xu,et al. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[12] Norman P. Jouppi,et al. Understanding the trade-offs in multi-level cell ReRAM memory design , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[13] Y. Leblebici,et al. Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: Comparative performance analysis (accuracy, speed, and power) , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).

[14] Tarek M. Taha,et al. Enabling back propagation training of memristor crossbar neuromorphic processors , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[15] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[16] Yu Wang,et al. Switched by input: Power efficient structure for RRAM-based convolutional neural network , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[17] Yann Le Cun,et al. A Theoretical Framework for Back-Propagation , 1988 .

[18] William J. Dally,et al. GPUs and the Future of Parallel Computing , 2011, IEEE Micro.

[19] Yu Wang,et al. Training itself: Mixed-signal training acceleration for memristor-based neural network , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).

[20] Yu Wang,et al. RRAM-Based Analog Approximate Computing , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.