High Performance Convolution Using Sparsity and Patterns for Inference in Deep Convolutional Neural Networks

Deploying deep Convolutional Neural Networks (CNNs) is impacted by their memory footprint and speed requirements, which mainly come from convolution. Widely-used convolution algorithms, im2col and MEC, produce a lowered matrix from an activation map by redundantly storing the map’s elements included at horizontal and/or vertical kernel overlappings without considering the sparsity of the map. Using the sparsity of the map, this paper proposes two new convolution algorithms dubbed Compressed Pattern Overlap (CPO) and Compressed Pattern Sets (CPS) that simultaneously decrease the memory footprint and increase the inference speed while preserving the accuracy. CPO recognizes non-zero elements (NZEs) at horizontal and vertical overlappings in the activation maps. CPS further improves the memory savings of CPO by compressing the index positions of neighboring NZEs. In both algorithms, channels/regions of the activation maps with all zeros are skipped. Then, CPO/CPS performs convolution via Sparse Matrix-Vector Multiplication (SpMv) done on their sparse representations. Experimental results conducted on CPUs show that average per-layer time savings reach up to 63% and Compression Ratio (CR) up to 26x with respect to im2col. In some layers, our average per layer CPO/CPS time savings are better by 28% and CR is better by 9.2x than the parallel implementation of MEC. For a given CNN’s inference, we offline select for each convolution layer the best convolutional algorithm in terms of time between either CPO or CPS and im2col. Our algorithms were selected up to 56% of the non-pointwise convolutional layers. Our offline selections yield CNN inference time savings up to 9% and CR up to 10x.

[1]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Natalie D. Enright Jerger,et al.  Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[4]  Lei Liu,et al.  Exploiting the input sparsity to accelerate deep neural networks: poster , 2019, PPoPP.

[5]  Jun Zhang,et al.  Communication-Efficient Edge AI: Algorithms and Systems , 2020, IEEE Communications Surveys & Tutorials.

[6]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[7]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[8]  Christopher R'e,et al.  Caffe con Troll: Shallow Ideas to Speed Up Deep Learning , 2015, DanaC@SIGMOD.

[9]  Pradeep Dubey,et al.  Faster CNNs with Direct Sparse Convolutions and Guided Pruning , 2016, ICLR.

[10]  Quoc V. Le,et al.  Meta Pseudo Labels , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[12]  Georgios Georgiadis,et al.  Accelerating Convolutional Neural Networks via Activation Map Compression , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[14]  Yiran Chen,et al.  Holistic SparseCNN: Forging the Trident of Accuracy, Speed, and Size , 2016, ArXiv.

[15]  Peter A. Beerel,et al.  Pre-Defined Sparsity for Low-Complexity Convolutional Neural Networks , 2020, IEEE Transactions on Computers.

[16]  Hassan Foroosh,et al.  Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Tor M. Aamodt,et al.  JPEG-ACT: Accelerating Deep Learning via Transform-based Lossy Compression , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[18]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[19]  Xuhao Chen,et al.  Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs , 2018, 1802.10280.

[20]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[21]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[22]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[23]  Shaohuai Shi,et al.  Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units , 2017, ArXiv.

[24]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[25]  Stephen W. Keckler,et al.  Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[26]  Vivienne Sze,et al.  Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[27]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[28]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[29]  Patrice Y. Simard,et al.  High Performance Convolutional Neural Networks for Document Processing , 2006 .

[30]  Victor S. Lempitsky,et al.  Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Dan Alistarh,et al.  Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks , 2021, J. Mach. Learn. Res..

[33]  Daniel Brand,et al.  MEC: Memory-efficient Convolution for Deep Neural Network , 2017, ICML.

[34]  Mark Sandler,et al.  The Power of Sparsity in Convolutional Neural Networks , 2017, ArXiv.

[35]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[36]  Dianjie Lu,et al.  CSCC: Convolution Split Compression Calculation Algorithm for Deep Neural Network , 2019, IEEE Access.

[37]  Dan Alistarh,et al.  Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks , 2020, ICML.

[38]  Vivienne Sze,et al.  Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2017, IEEE Journal of Solid-State Circuits.

[39]  Jason Cong,et al.  Minimizing Computation in Convolutional Neural Networks , 2014, ICANN.