论文信息 - A 3D multi-layer CMOS-RRAM accelerator for neural network

A 3D multi-layer CMOS-RRAM accelerator for neural network

Incremental machine learning is required for future real-time data analytics. This paper introduces a 3D multilayer CMOS-RRAM accelerator for an incremental least-squares based learning on neural network. Given input of buffered data hold on the layer of a RRAM memory, intensive matrix-vector multiplication can be firstly accelerated on the layer of a digitized RRAM-crossbar. The remaining incremental leastsquares algorithmic operations for feature extraction and classifier training can be accelerated on the layer of CMOS ASIC, using an incremental Cholesky factorization accelerator realized with consideration of parallelism and pipeline. Experiment results have shown that such a 3D accelerator can significantly reduce training time with acceptable accuracy. Compared to 3D-CMOS-ASIC implementation, it can achieve 1.28x smaller area, 2.05x faster runtime and 12.4x energy reduction. Compared to GPU implementation, our work shows 3.07x speed-up and 162.86x energy-saving.

[1] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[2] Hisashi Shima,et al. Resistive Random Access Memory (ReRAM) Based on Metal Oxides , 2010, Proceedings of the IEEE.

[3] Abbas El Gamal,et al. Nonvolatile 3D-FPGA with monolithically stacked RRAM-based configuration memory , 2012, 2012 IEEE International Solid-State Circuits Conference.

[4] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[5] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[6] Shimeng Yu,et al. Technology-design co-optimization of resistive cross-point array for accelerating learning algorithms on chip , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[7] Wei Zhang,et al. Non-volatile 3D stacking RRAM-based FPGA , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[8] Sung Kyu Lim,et al. Study of Through-Silicon-Via Impact on the 3-D Stacked IC Layout , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9] Chee Kheong Siew,et al. Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[10] Eric Rotenberg,et al. Computing in 3D , 2015, CICC.

[11] G. Huang,et al. An Energy-Efficient Nonvolatile In-Memory Computing Architecture for Extreme Learning Machine by Domain-Wall Nanowire Devices , 2015, IEEE Transactions on Nanotechnology.

[12] Klaus-Robert Müller,et al. Machine learning for real-time single-trial EEG-analysis: From brain–computer interfacing to mental state monitoring , 2008, Journal of Neuroscience Methods.

[13] Rasit O. Topaloglu,et al. More than Moore Technologies for Next Generation Computer Design , 2015 .

[14] Frederick T. Chen,et al. Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM , 2008, 2008 IEEE International Electron Devices Meeting.

[15] Narayan Srinivasa,et al. A functional hybrid memristor crossbar-array/CMOS system for data storage and neuromorphic applications. , 2012, Nano letters.

[16] Shimeng Yu,et al. 3D vertical RRAM - Scaling limit analysis and demonstration of 3D array operation , 2013, 2013 Symposium on VLSI Technology.

[17] Yu Wang,et al. Technological exploration of RRAM crossbar array for matrix-vector multiplication , 2015, ASP-DAC.

[18] Hao Yu,et al. An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar , 2016, ASP-DAC.

[19] Chun Zhang,et al. Design exploration of 3D stacked non-volatile memory by conductive bridge based crossbar , 2012, 2011 IEEE International 3D Systems Integration Conference (3DIC), 2011 IEEE International.