Analog CMOS-based resistive processing unit for deep neural network training

Recently we have shown that an architecture based on resistive processing unit (RPU) devices has potential to achieve significant acceleration in deep neural network (DNN) training compared to today's software-based DNN implementations running on CPU/GPU. However, currently available device candidates based on non-volatile memory technologies do not satisfy all the requirements to realize the RPU concept. Here, we propose an analog CMOS-based RPU design (CMOS RPU) which can store and process data locally and can be operated in a massively parallel manner. We analyze various properties of the CMOS RPU to evaluate the functionality and feasibility for acceleration of DNN training.

[1]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[2]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[3]  Gökmen Tayfun,et al.  Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design Considerations , 2016, Front. Neurosci..

[4]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[6]  Suyog Gupta,et al.  Model Accuracy and Runtime Tradeoff in Distributed Deep Learning: A Systematic Study , 2015, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[7]  Pritish Narayanan,et al.  Neuromorphic computing using non-volatile memory , 2017 .