论文信息 - O-2A: Low Overhead DNN Compression with Outlier-Aware Approximation

O-2A: Low Overhead DNN Compression with Outlier-Aware Approximation

We present a low-latency DNN compression technique to reduce DRAM energy, significant in DNN inferences, namely Outlier-Aware Approximation (O-2A) coding. This technique compresses 8-bit integer, de-facto standard of DNN inferences, to 6-bit without degrading the accuracies of DNNs. The hardware for the O-2A coding can be easily embedded to DRAM controllers due to small overhead. In an Eyeriss platform, the O-2A coding improves both DRAM energy and system performance by 18~20%. The O-2A coding enables us to implement an error-correction scheme without additional parity overhead, opening the possibility of an approximate DRAM to simultaneously reduce DRAM accessing and refresh energy.

[1] Eunhyeok Park,et al. Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[2] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[3] Swagath Venkataramani,et al. BiScaled-DNN: Quantizing Long-tailed Datastructures with Two Scale Factors for Deep Neural Networks , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[4] Jae-Joon Han,et al. Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[6] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[7] Matthew Mattina,et al. SCALE-Sim: Systolic CNN Accelerator , 2018, ArXiv.

[8] Song Han,et al. Trained Ternary Quantization , 2016, ICLR.

[9] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[10] Eunhyeok Park,et al. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[11] Sung-Kye Park,et al. Technology Scaling Challenge and Future Prospects of DRAM and NAND Flash Memory , 2015, 2015 IEEE International Memory Workshop (IMW).

[12] Luca Benini,et al. EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators , 2019, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[13] Daniel Soudry,et al. Post training 4-bit quantization of convolutional networks for rapid-deployment , 2018, NeurIPS.