TFE: Energy-efficient Transferred Filter-based Engine to Compress and Accelerate Convolutional Neural Networks

Although convolutional neural network (CNN) models have greatly enhanced the development of many fields, the untenable number of parameters and computations in these models yield significant performance and energy challenges in hardware implementations. Transferred filter-based methods, as very promising techniques that have not yet been explored in the architecture domain, can substantially compress CNN models. However, their straightforward hardware implementation inherently incurs massive redundant computations, causing significant energy and time consumption. In this work, a highly efficient transferred filter-based engine (TFE) is developed to alleviate this deficiency, with CNN models compressed and accelerated. First, the filters of CNN models are flexibly transferred according to specific tasks to reduce the model size. Then, two hardware-friendly mechanisms are proposed in the TFE to remove duplicate computations caused by transferred filters, which can further accelerate transferred CNN models. The first mechanism exploits the shared weights hidden in each row of transferred filters and reuses the corresponding same partial sums, reducing at least 25% of repetitive computations in each row. The second mechanism can intelligently schedule and access the memory system to reuse the repetitive partial sums among different rows of the transferred filters with at least 25% of computations eliminated. Furthermore, an efficient hardware architecture is proposed in the TFE to fully reap the benefits of the two proposed mechanisms such that different types of networks are flexibly supported. To achieve high energy efficiency, the sub-array-based filter mapping method (SAFM) is proposed, where the process element (PE) subarray is used as the elementary computational unit to support various filters. Therein, input data can be efficiently broadcast in each PE sub-array and the load can be stripped from each PE and intensively alleviated, which can dramatically reduce the area and power consumption. Excluding MobileNet-like networks that adopt depth-wise convolution, most mainstream networks can be compressed and accelerated by the proposed TFE. Two state-of-the-art transferred filter-based methods, i.e., doubly CNN and symmetry CNN are implemented by exploiting the TFE. Compared with Eyeriss, average speedup improvements of 2.93× and 3.17× are achieved in the convolutional layers of various modern CNNs. The overall energy efficiency can be improved by 12.66× and 13.31× on average. Compared with other state-of-the-art related works, the TFE can maximally achieve a parameter reduction of 4.0×, a speedup of 2.72× and an energy efficiency improvement of 10.74× on VGGNet.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Dimitrios Soudris,et al.  Efficient Winograd-based Convolution Kernel Implementation on Edge Devices , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[3]  H. T. Kung,et al.  Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization , 2018, ASPLOS.

[4]  Heng Huang,et al.  Direct Shape Regression Networks for End-to-End Face Alignment , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[6]  Tao Zhang,et al.  Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges , 2018, IEEE Signal Processing Magazine.

[7]  Xiaogang Wang,et al.  Multi-Bias Non-linear Activation in Deep Neural Networks , 2016, ICML.

[8]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[9]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Hadi Esmaeilzadeh,et al.  Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network , 2017, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[12]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[13]  C.-C. Jay Kuo,et al.  Fast face detection on mobile devices by leveraging global and local facial characteristics , 2019, Signal Process. Image Commun..

[14]  Christoforos E. Kozyrakis,et al.  TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators , 2019, ASPLOS.

[15]  Yun Liang,et al.  SpWA: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[16]  Zhongfei Zhang,et al.  Doubly Convolutional Neural Networks , 2016, NIPS.

[17]  Jayakorn Vongkulbhisal,et al.  Discriminative Optimization: Theory and Applications to Computer Vision , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yiran Chen,et al.  2PFPCE: Two-Phase Filter Pruning Based on Conditional Entropy , 2018, ArXiv.

[19]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Kyoung Mu Lee,et al.  Clustering Convolutional Kernels to Compress Deep Neural Networks , 2018, ECCV.

[21]  Honglak Lee,et al.  Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units , 2016, ICML.

[22]  Tadahiro Kuroda,et al.  BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS , 2017, 2017 Symposium on VLSI Circuits.

[23]  Eduardo Coutinho,et al.  Connecting Subspace Learning and Extreme Learning Machine in Speech Emotion Recognition , 2019, IEEE Transactions on Multimedia.

[24]  Christoforos E. Kozyrakis,et al.  TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.

[25]  Yu Cheng,et al.  Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.

[26]  Jason Cong,et al.  Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[27]  Zhimin Li,et al.  NLIZE: A Perturbation-Driven Visual Interrogation Tool for Analyzing and Interpreting Natural Language Inference Models , 2019, IEEE Transactions on Visualization and Computer Graphics.

[28]  Michael Ferdman,et al.  Maximizing CNN accelerator efficiency through resource partitioning , 2016, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[29]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[32]  Youchang Kim,et al.  14.6 A 0.62mW ultra-low-power convolutional-neural-network face-recognition processor and a CIS integrated with always-on haar-like face detector , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[33]  Jae-sun Seo,et al.  XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks , 2018, 2018 IEEE Symposium on VLSI Technology.

[34]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[35]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Tao Li,et al.  Prediction Based Execution on Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[37]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[38]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[39]  Qiang Li,et al.  A 1.17 TOPS/W, 150fps Accelerator for Multi-Face Detection and Alignment , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[40]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[41]  Koray Kavukcuoglu,et al.  Exploiting Cyclic Symmetry in Convolutional Neural Networks , 2016, ICML.

[42]  Leibo Liu,et al.  An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width , 2019, IEEE Journal of Solid-State Circuits.

[43]  Xiaowei Li,et al.  C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[44]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[45]  Mengjia Yan,et al.  UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[46]  Hoi-Jun Yoo,et al.  UNPU: A 50.6TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[47]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[48]  Patrick Judd,et al.  Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks , 2019, ASPLOS.

[49]  Rajesh K. Gupta,et al.  SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[50]  Jianbo Su,et al.  Deep convolutional neural networks compression method based on linear representation of kernels , 2019, International Conference on Machine Vision.

[51]  Shenghuo Zhu,et al.  Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM , 2017, AAAI.

[52]  Jiayu Li,et al.  ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers , 2018, ASPLOS.

[53]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[54]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[55]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[56]  Asifullah Khan,et al.  A survey of the recent architectures of deep convolutional neural networks , 2019, Artificial Intelligence Review.