SkippyNN: An Embedded Stochastic-Computing Accelerator for Convolutional Neural Networks

Employing convolutional neural networks (CNNs) in embedded devices seeks novel low-cost and energy efficient CNN accelerators. Stochastic computing (SC) is a promising low-cost alternative to conventional binary implementations of CNNs. Despite the low-cost advantage, SC-based arithmetic units suffer from prohibitive execution time due to processing long bit-streams. In particular, multiplication as the main operation in convolution computation, is an extremely time-consuming operation which hampers employing SC methods in designing embedded CNNs.In this work, we propose a novel architecture, called SkippyNN, that reduces the computation time of SC-based multiplications in the convolutional layers of CNNs. Each convolution in a CNN is composed of numerous multiplications where each input value is multiplied by a weight vector. Producing the result of the first multiplication, the following multiplications can be performed by multiplying the input and the differences of the successive weights. Leveraging this property, we develop a differential Multiply-and-Accumulate unit, called DMAC, to reduce the time consumed by convolutions in SkippyNN. We evaluate the efficiency of SkippyNN using four modern CNNs. On average, SkippyNN offers 1.2x speedup and 2.7x energy saving compared to the binary implementation of CNN accelerators.

[1]  John P. Hayes,et al.  Energy-efficient hybrid stochastic-binary neural networks for near-sensor computing , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[2]  Jongeun Lee,et al.  DPS: Dynamic Precision Scaling for Stochastic Computing-based Deep Neural Networks* , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[3]  Kia Bazargan,et al.  Low-Cost Sorting Network Circuits Using Unary Processing , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Kia Bazargan,et al.  Energy-Efficient Convolutional Neural Networks with Deterministic Bit-Stream Processing , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[6]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[7]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[8]  Mehdi Modarressi,et al.  Power-Efficient Accelerator Design for Neural Networks Using Computation Reuse , 2017, IEEE Computer Architecture Letters.

[9]  Ji Li,et al.  Towards acceleration of deep convolutional neural networks using stochastic computing , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[10]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[11]  Jongeun Lee,et al.  A new stochastic computing multiplier with application to deep convolutional neural networks , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Shaoli Liu,et al.  Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[14]  David J. Lilja,et al.  Low-Cost Stochastic Hybrid Multiplier for Quantized Neural Networks , 2019, ACM J. Emerg. Technol. Comput. Syst..

[15]  John P. Hayes,et al.  The Promise and Challenge of Stochastic Computing , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[16]  John P. Hayes,et al.  Survey of Stochastic Computing , 2013, TECS.

[17]  David J. Lilja,et al.  Using Stochastic Computing to Reduce the Hardware Requirements for a Restricted Boltzmann Machine Classifier , 2016, FPGA.

[18]  Mingjie Lin,et al.  Stochastic-Based Deep Convolutional Networks with Reconfigurable Logic Fabric , 2016, IEEE Transactions on Multi-Scale Computing Systems.

[19]  Masoud Daneshtalab,et al.  Customizing Clos Network-on-Chip for Neural Networks , 2017, IEEE Transactions on Computers.

[20]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[21]  Qinru Qiu,et al.  SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing , 2016, ASPLOS.

[22]  Fabrizio Lombardi,et al.  An energy-efficient stochastic computational deep belief network , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[23]  Hadi Esmaeilzadeh,et al.  Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network , 2017, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[24]  Patrick Judd,et al.  Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Rajesh K. Gupta,et al.  SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[26]  Kia Bazargan,et al.  Computation on Stochastic Bit Streams Digital Image Processing Case Studies , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.