论文信息 - Deep Convolutional Neural Network Accelerator Featuring Conditional Computing and Low External Memory Access

Deep Convolutional Neural Network Accelerator Featuring Conditional Computing and Low External Memory Access

This paper presents an ASIC accelerator for deep convolutional neural networks (DCNNs) featuring a novel conditional computing scheme that synergistically combines precision-cascading with zero-skipping. To reduce many redundant convolution operations that are followed by max-pooling operations, we propose precision-cascading, where the input features are divided into a number of low-precision groups and approximate convolutions with only the most significant bits (MSBs) are performed first. Based on this approximate computation, the full-precision convolution is performed only on the maximum pooling output that is found. This way, the total number of bit-wise convolutions can be reduced by ~2× without affecting the output feature values and with <0.8% degradation in final ImageNet classification accuracy. Precision-cascading provides the added benefit of increased sparsity per low-precision group, which we exploit with zero-skipping to eliminate clock cycles as well as external memory access that involve zero inputs. By jointly optimizing the conditional computing scheme and hardware architecture, the 40nm prototype chip demonstrates a peak energy-efficiency of 8.85 TOPS/W at 0.9V supply and low external memory access of 55.31 MB (or 0.0018 access/MAC) for ImageNet classification with VGG-16 CNN.

Minkyu Kim | Jae-Sun Seo | Jae-sun Seo | Minkyu Kim

[1] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[2] Jaehyeong Sim,et al. An Energy-Efficient Deep Convolutional Neural Network Inference Processor With Enhanced Output Stationary Dataflow in 65-nm CMOS , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3] Sanu Mathew,et al. 14.4 A 21.5M-query-vectors/s 3.37nJ/vector reconfigurable k-nearest-neighbor accelerator with adaptive precision in 14nm tri-gate CMOS , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[4] Marios C. Papaefthymiou,et al. A 0.23mW Heterogeneous Deep-Learning Processor Supporting Dynamic Execution of Conditional Neural Networks , 2018, ESSCIRC 2018 - IEEE 44th European Solid State Circuits Conference (ESSCIRC).

[5] Alessandro Aimar,et al. NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[6] Hoi-Jun Yoo,et al. 7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16 , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[7] Jihyuck Jo,et al. DSIP: A Scalable Inference Accelerator for Convolutional Neural Networks , 2018, IEEE Journal of Solid-State Circuits.

[8] Marian Verhelst,et al. 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[9] Joel Emer,et al. Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[10] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.