论文信息 - Octo: INT8 Training with Loss-aware Compensation and Backward Quantization for Tiny On-device Learning

Octo: INT8 Training with Loss-aware Compensation and Backward Quantization for Tiny On-device Learning

On-device learning is an emerging technique to pave the last mile of enabling edge intelligence, which eliminates the limitations of conventional in-cloud computing where dozens of computational capacities and memories are needed. A highperformance on-device learning system requires breaking the constraints of limited resources and alleviating computational overhead. In this paper, we show that employing the 8-bit fixed-point (INT8) quantization in both forward and backward passes over a deep model is a promising way to enable tiny on-device learning in practice. The key to an efficient quantization-aware training method is to exploit the hardwarelevel enabled acceleration while preserving the training quality in each layer. However, off-the-shelf quantization methods cannot handle the on-device learning paradigm of fixed-point processing. To overcome these challenges, we propose a novel INT8 training method, which optimizes the computation of forward and backward passes via the delicately designed Lossaware Compensation (LAC) and Parameterized Range Clipping (PRC), respectively. Specifically, we build a new network component, the compensation layer, to automatically counteract the quantization error of tensor arithmetic. We implement our method in Octo, a lightweight cross-platform system for tiny on-device learning. Evaluation on commercial AI chips shows that Octo holds higher training efficiency over state-of-the-art quantization training methods, while achieving adequate processing speedup and memory reduction over the full-precision training.

[1] Yurong Chen,et al. Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[2] Elad Hoffer,et al. ACIQ: Analytical Clipping for Integer Quantization of neural networks , 2018, ArXiv.

[3] Hiroki Matsutani,et al. A Neural Network-Based On-Device Learning Anomaly Detector for Edge Devices , 2019, IEEE Transactions on Computers.

[4] Swagath Venkataramani,et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.

[5] Elad Hoffer,et al. Scalable Methods for 8-bit Training of Neural Networks , 2018, NeurIPS.

[6] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.

[7] Wencong Xiao,et al. Gandiva: Introspective Cluster Scheduling for Deep Learning , 2018, OSDI.

[8] Zhijian Liu,et al. HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Yong Zhao,et al. Binarized Neural Networks on the ImageNet Classification Task , 2016 .

[10] Joos Vandewalle,et al. A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[11] Nicholas D. Lane,et al. DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware , 2017, MobiSys.

[12] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[13] Steven Skiena,et al. DeepWalk: online learning of social representations , 2014, KDD.

[14] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15] Yaoliang Yu,et al. Distributed Machine Learning via Sufficient Factor Broadcasting , 2015, ArXiv.

[16] Jaesik Choi,et al. HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism , 2020, USENIX ATC.

[17] Vikas Chandra,et al. CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs , 2018, ArXiv.

[18] Chuang Gan,et al. Once for All: Train One Network and Specialize it for Efficient Deployment , 2019, ICLR.

[19] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.

[20] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[21] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[22] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[23] Wonyong Sung,et al. Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..

[24] Yibo Zhu,et al. A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.

[25] Xu Chen,et al. Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing , 2019, Proceedings of the IEEE.

[26] Rajendra Akerkar,et al. On-Device Learning Systems for Edge Intelligence: A Software and Hardware Synergy Perspective , 2021, IEEE Internet of Things Journal.

[27] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[29] Markus Nagel,et al. Data-Free Quantization Through Weight Equalization and Bias Correction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30] Jacek M. Zurada,et al. Feature Selection for Neural Networks Using Group Lasso Regularization , 2020, IEEE Transactions on Knowledge and Data Engineering.

[31] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[32] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[33] Jae-Joon Han,et al. Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[35] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[36] Yaoliang Yu,et al. Lighter-Communication Distributed Machine Learning via Sufficient Factor Broadcasting , 2016, UAI.

[37] Song Han,et al. DSD: Dense-Sparse-Dense Training for Deep Neural Networks , 2016, ICLR.

[38] Rémi Gribonval,et al. And the Bit Goes Down: Revisiting the Quantization of Neural Networks , 2019, ICLR.

[39] Xianglong Liu,et al. Towards Unified INT8 Training for Convolutional Neural Network , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Song Han,et al. TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning , 2020, NeurIPS.

[41] Song Han,et al. Exploring the Regularity of Sparse Structure in Convolutional Neural Networks , 2017, ArXiv.

[42] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[43] Xin Wang,et al. SkipNet: Learning Dynamic Routing in Convolutional Networks , 2017, ECCV.

[44] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[45] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[46] H. T. Kung,et al. Full-stack optimization for accelerating CNNs using powers-of-two weights with FPGA validation , 2019, ICS.

[47] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.

[48] Xin Dong,et al. Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.

[49] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[50] Wei Wang,et al. Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks , 2020, ICLR.

[51] Xiao Zeng,et al. NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision , 2018, MobiCom.

[52] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53] Max Welling,et al. Relaxed Quantization for Discretized Neural Networks , 2018, ICLR.

[54] Song Han,et al. MCUNet: Tiny Deep Learning on IoT Devices , 2020, NeurIPS.

[55] Yoshua Bengio,et al. Neural Networks with Few Multiplications , 2015, ICLR.