A Case for Dynamic Activation Quantization in CNNs

It is a well-established fact that CNNs are robust enough to tolerate low precision computations without any significant loss in accuracy. There have been works that exploit this fact, and try to allocate different precision for different layers (for both weights and activations), depending on the importance of a layer's precision in dictating the prediction accuracy. In all these works, the layer-wise precision of weights and activations is decided for a network by performing an offline design space exploration as well as retraining of weights. While these approaches show significant energy improvements, they make global decisions for precision requirements. In this project, we try to answer the question "Can we vary the inter-and intra-layer bit-precision based on the region-wise importance of the individual input?". The intuition behind this is that for a particular image, there might be regions that can be considered as background or unimportant for the network to make its final prediction. As these inputs propagate through the network, the regions of less importance in the same feature map can tolerate lower precision. Using metrics such as entropy, color gradient, and points of interest, we argue that a region of an image can be labeled important or unimportant, thus enabling lower precision for unimportant pixels. We show that per-input activation quantization can reduce computational energy up to 33.5% or 42.0% while maintaining original Top-1 and Top-5 accuracies respectively.

[1]  Patrick Judd,et al.  Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[3]  Andreas Moshovos,et al.  Bit-Pragmatic Deep Neural Network Computing , 2016, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[5]  Yusuf Leblebici,et al.  A 3.1 mW 8b 1.2 GS/s Single-Channel Asynchronous SAR ADC With Alternate Comparators for Enhanced Speed in 32 nm Digital SOI CMOS , 2013, IEEE Journal of Solid-State Circuits.

[6]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Eunhyeok Park,et al.  Weighted-Entropy-Based Quantization for Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).