An area and energy efficient design of domain-wall memory-based deep convolutional neural networks using stochastic computing

With recent trend of wearable devices and Internet of Things (IoTs), it becomes attractive to develop hardware-based deep convolutional neural networks (DCNNs) for embedded applications, which require low power/energy consumptions and small hardware footprints. Recent works demonstrated that the Stochastic Computing (SC) technique can radically simplify the hardware implementation of arithmetic units and has the potential to satisfy the stringent power requirements in embedded devices. However, in these works, the memory design optimization is neglected for weight storage, which will inevitably result in large hardware cost. Moreover, if conventional volatile SRAM or DRAM cells are utilized for weight storage, the weights need to be re-initialized whenever the DCNN platform is re-started. In order to overcome these limitations, in this work we adopt an emerging non-volatile Domain-Wall Memory (DWM), which can achieve ultra-high density, to replace SRAM for weight storage in SC-based DCNNs. We propose DW-CNN, the first comprehensive design optimization framework of DWM-based weight storage method. We derive the optimal memory type, precision, and organization, as well as whether to store binary or stochastic numbers. We present effective resource sharing scheme for DWM-based weight storage in the convolutional and fully-connected layers of SC-based DCNNs to achieve a desirable balance among area, power (energy) consumption, and application-level accuracy.

[1]  Steve B. Furber,et al.  Scalable energy-efficient, low-latency implementations of trained spiking Deep Belief Networks on SpiNNaker , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[2]  Luca Benini,et al.  YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Qinru Qiu,et al.  C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs , 2018, FPGA.

[5]  Yiran Chen,et al.  Memristor Crossbar-Based Neuromorphic Computing System: A Case Study , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Kiyoung Choi,et al.  Approximate de-randomizer for stochastic circuits , 2015, 2015 International SoC Design Conference (ISOCC).

[7]  Qinru Qiu,et al.  Designing reconfigurable large-scale deep learning systems using stochastic computing , 2016, 2016 IEEE International Conference on Rebooting Computing (ICRC).

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  Kaushik Roy,et al.  Cache Design with Domain Wall Memory , 2016, IEEE Transactions on Computers.

[10]  Tsuyoshi Iwagaki,et al.  Compact and accurate stochastic circuits with shared random number sources , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[11]  Kiyoung Choi,et al.  Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Qinru Qiu,et al.  SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing , 2016, ASPLOS.

[13]  K. Roy,et al.  Numerical analysis of domain wall propagation for dense memory arrays , 2011, 2011 International Electron Devices Meeting.

[14]  P. Szolgay,et al.  Analysis of a GPU based CNN implementation , 2012, 2012 13th International Workshop on Cellular Nanoscale Networks and their Applications.

[15]  Hai Helen Li,et al.  Spintronic Memristor Through Spin-Torque-Induced Magnetization Motion , 2009, IEEE Electron Device Letters.

[16]  Massoud Pedram,et al.  FFT-based deep learning deployment in embedded systems , 2017, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[17]  Seyedhamidreza Motaman,et al.  Domain Wall Memory-Layout, Circuit and Synergistic Systems , 2015, IEEE Transactions on Nanotechnology.

[18]  Ji Li,et al.  Softmax Regression Design for Stochastic Computing Based Deep Convolutional Neural Networks , 2017, ACM Great Lakes Symposium on VLSI.

[19]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[20]  Wolfram Burgard,et al.  Learning driving styles for autonomous vehicles from demonstration , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Noel E. O'Connor,et al.  An Efficient Hardware Architecture for a Neural Network Activation Function Generator , 2006, ISNN.

[22]  Qinru Qiu,et al.  Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework , 2018, AAAI.

[23]  Kiyoung Choi,et al.  An energy-efficient random number generator for stochastic circuits , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[24]  S. Parkin,et al.  Magnetic Domain-Wall Racetrack Memory , 2008, Science.

[25]  Ji Li,et al.  Towards acceleration of deep convolutional neural networks using stochastic computing , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[26]  Stefano Mattoccia,et al.  A wearable mobility aid for the visually impaired based on embedded 3D vision and deep learning , 2016, 2016 IEEE Symposium on Computers and Communication (ISCC).

[27]  Yiorgos Makris,et al.  A dual-mode weight storage analog neural network platform for on-chip applications , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[28]  Chao Wang,et al.  CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[29]  Shahin Nazarian,et al.  Normalization and dropout for stochastic computing-based deep convolutional neural networks , 2019, Integr..

[30]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[31]  Brian R. Gaines,et al.  Stochastic computing , 1967, AFIPS '67 (Spring).

[32]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[33]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[34]  Ji Li,et al.  Hardware-driven nonlinear activation for stochastic computing based deep convolutional neural networks , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[35]  Claus Nebauer,et al.  Evaluation of convolutional neural networks for visual recognition , 1998, IEEE Trans. Neural Networks.

[36]  Luca Benini,et al.  YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[37]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..