论文信息 - On-line machine learning accelerator on digital RRAM-crossbar

On-line machine learning accelerator on digital RRAM-crossbar

On-line machine learning has become the need for future data analytics. This work will show an ℓ2 norm based hardware solver for on-line machine learning that can significantly reduce training time when compared to the traditional gradient-based solution using backward propagation. We will show that the intensive matrix-vector multiplication in ℓ2 norm solution can be mapped onto a distributed in-memory accelerator using the recent resistive switching random access memory (RRAM) device. A digitized matrix-vector multiplication accelerator will be developed based on the distributed RRAM-crossbar. Such a distributed RRAM-crossbar architecture can utilize the reformulated ℓ2 norm solver with a scalable and energy-efficient solution for real-time training and testing in image recognition. Experiment results have shown that significant speedup can be achieved for matrix-vector multiplication in the ℓ2 norm solver such hat the overall training and testing time can be reduced respectively. In addition, large energy saving can be also achieved when compared to the traditional CMOS-based out-of-memory computing architecture.

Hao Yu | Hantao Huang | Leibin Ni

[1] Kok Seng Chua,et al. Efficient computations for large least square support vector machine classifiers , 2003, Pattern Recognit. Lett..

[2] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[3] Hao Yu,et al. An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar , 2016, ASP-DAC.

[4] Frederick T. Chen,et al. Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM , 2008, 2008 IEEE International Electron Devices Meeting.

[5] Chee Kheong Siew,et al. Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[6] Narayan Srinivasa,et al. A functional hybrid memristor crossbar-array/CMOS system for data storage and neuromorphic applications. , 2012, Nano letters.

[7] Takeo Kanade,et al. Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8] Klaus-Robert Müller,et al. Machine learning for real-time single-trial EEG-analysis: From brain–computer interfacing to mental state monitoring , 2008, Journal of Neuroscience Methods.

[9] Hongming Zhou,et al. Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10] Johan A. K. Suykens,et al. Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[11] G. Huang,et al. An Energy-Efficient Nonvolatile In-Memory Computing Architecture for Extreme Learning Machine by Domain-Wall Nanowire Devices , 2015, IEEE Transactions on Nanotechnology.

[12] Yong Zhang,et al. A digital neuromorphic VLSI architecture with memristor crossbar synaptic array for machine learning , 2012, 2012 IEEE International SOC Conference.

[13] Hisashi Shima,et al. Resistive Random Access Memory (ReRAM) Based on Metal Oxides , 2010, Proceedings of the IEEE.

[14] Wei Lu,et al. Two-terminal resistive switches (memristors) for memory and logic applications , 2011, 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011).

[15] Guang-Bin Huang,et al. Classification ability of single hidden layer feedforward neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[16] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[17] Marwan Mattar,et al. Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .