Deep Convolutional Neural Network Accelerator Featuring Conditional Computing and Low External Memory Access

This paper presents an ASIC accelerator for deep convolutional neural networks (DCNNs) featuring a novel conditional computing scheme that synergistically combines precision-cascading with zero-skipping. To reduce many redundant convolution operations that are followed by max-pooling operations, we propose precision-cascading, where the input features are divided into a number of low-precision groups and approximate convolutions with only the most significant bits (MSBs) are performed first. Based on this approximate computation, the full-precision convolution is performed only on the maximum pooling output that is found. This way, the total number of bit-wise convolutions can be reduced by ~2× without affecting the output feature values and with <0.8% degradation in final ImageNet classification accuracy. Precision-cascading provides the added benefit of increased sparsity per low-precision group, which we exploit with zero-skipping to eliminate clock cycles as well as external memory access that involve zero inputs. By jointly optimizing the conditional computing scheme and hardware architecture, the 40nm prototype chip demonstrates a peak energy-efficiency of 8.85 TOPS/W at 0.9V supply and low external memory access of 55.31 MB (or 0.0018 access/MAC) for ImageNet classification with VGG-16 CNN.

[1]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[2]  Jaehyeong Sim,et al.  An Energy-Efficient Deep Convolutional Neural Network Inference Processor With Enhanced Output Stationary Dataflow in 65-nm CMOS , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Sanu Mathew,et al.  14.4 A 21.5M-query-vectors/s 3.37nJ/vector reconfigurable k-nearest-neighbor accelerator with adaptive precision in 14nm tri-gate CMOS , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[4]  Marios C. Papaefthymiou,et al.  A 0.23mW Heterogeneous Deep-Learning Processor Supporting Dynamic Execution of Conditional Neural Networks , 2018, ESSCIRC 2018 - IEEE 44th European Solid State Circuits Conference (ESSCIRC).

[5]  Alessandro Aimar,et al.  NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Hoi-Jun Yoo,et al.  7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16 , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[7]  Jihyuck Jo,et al.  DSIP: A Scalable Inference Accelerator for Convolutional Neural Networks , 2018, IEEE Journal of Solid-State Circuits.

[8]  Marian Verhelst,et al.  14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[9]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.