SLID: Exploiting Spatial Locality in Input Data as a Computational Reuse Method for Efficient CNN

Convolutional Neural Networks (CNNs) revolutionized computer vision and reached the state-of-the-art performance for image processing, object recognition, and video classification. Even though CNN inference is notoriously compute-intensive, as convolutions account for >90% of the total operation tasks, the ability to tradeoff between accuracy, performance, power, and latency to meet target application makes it an open research topic. This paper proposes the Spatial Locality Input Data (SLID) method for computational reuse during the inference stage for a pre-trained network. The method exploits input data spatial locality via skipping partial processing of the multiply-and-accumulate (MAC) operations for adjacent data and equating its value to previously computed ones. SLID improves the throughput of resource-constrained devices (Internet-of-Things, edge devices) and accelerates computations during the inference phase by reducing the number of MAC operations. Such approximate computing schema does not require a similarity quantification step nor any modification for the training stage. The computational data reuse was evaluated on three well-known distinctive CNN structures and data sets with alternating layer selections: LeNet, CIFAR-10, and AlexNet. The computational data reuse method saves up to 34.9%, 49.84%, and 31.5% of MAC operations while reducing the accuracy by 8%, 3.7%, and 5.0% for the three models mentioned earlier, respectively. Besides, the proposed method saves on memory access by eliminating data fetching of skipped inputs. Furthermore, filter size, strides, and padding on the accuracy and savings of operations are analyzed. SLID is the first work to exploit the input spatial locality for savings on CNN convolution operations with minimal accuracy loss and without memory or computational overhead. This makes it a great option to support intelligence at the edge.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Andrew Lavin,et al.  Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  A.I. Sosin,et al.  Optimization of Neural Network Computation with use of Residual Number System for Tasks of Design of Neural Network Systems of Automatic Control , 2018, 2018 International Multi-Conference on Industrial Engineering and Modern Technologies (FarEastCon).

[4]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[5]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[6]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[7]  Rajesh K. Gupta,et al.  Energy-efficient neural networks using approximate computation reuse , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[8]  Jose-Maria Arnau,et al.  Computation Reuse in DNNs by Exploiting Input Similarity , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[9]  Said F. Al-Sarawi,et al.  Memristor-Based Hardware Accelerator for Image Compression , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  Xipeng Shen,et al.  Adaptive Deep Reuse: Accelerating CNN Training on the Fly , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[11]  Xin He,et al.  Exploiting the Potential of Computation Reuse Through Approximate Computing , 2017, IEEE Transactions on Multi-Scale Computing Systems.

[12]  Meng-Fan Chang,et al.  A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[13]  Xipeng Shen,et al.  Deep reuse: streamline CNN inference on the fly via coarse-grained computation reuse , 2019, ICS.

[14]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[15]  Luigi Carro,et al.  Skipping CNN Convolutions Through Efficient Memoization , 2019, SAMOS.

[16]  Baker Mohammad,et al.  NeuroMem: Analog Graphene-Based Resistive Memory for Artificial Neural Networks , 2020, Scientific Reports.

[17]  Catherine Graves,et al.  Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[18]  Youfeng Wu,et al.  Better exploration of region-level value locality with integrated computation reuse and value prediction , 2001, ISCA 2001.

[19]  Said F. Al-Sarawi,et al.  ReRAM-Based In-Memory Computing for Search Engine and Neural Network Applications , 2019, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[20]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[22]  Mingyu Wang,et al.  ERA-LSTM: An Efficient ReRAM-Based Architecture for Long Short-Term Memory , 2020, IEEE Transactions on Parallel and Distributed Systems.

[23]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[24]  Tinoosh Mohsenin,et al.  Accelerating Convolutional Neural Network With FFT on Embedded Hardware , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[25]  Song Guo,et al.  Exploiting Computation Reuse in Cloud-Based Deep Learning via Input Reordering , 2020, ICC 2020 - 2020 IEEE International Conference on Communications (ICC).

[26]  Mengjia Yan,et al.  UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[27]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Baker Mohammad,et al.  Modeling Valance Change Memristor Device: Oxide Thickness, Material Type, and Temperature Effects , 2016, IEEE Transactions on Circuits and Systems I: Regular Papers.

[29]  Huanrui Yang,et al.  AtomLayer: A Universal ReRAM-Based CNN Accelerator with Atomic Layer Computation , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[30]  Azadeh Davoodi,et al.  Efficient Inference of CNNs via Channel Pruning , 2019, ArXiv.