论文信息 - Towards Efficient Compact Network Training on Edge-Devices

Towards Efficient Compact Network Training on Edge-Devices

Currently, there is a trend to deploy training on edge devices, which is crucial to future AI applications in various scenarios with transfer and online learning demands. Specifically, there may be a severe degradation of accuracy when directly deploying the trained models on edge devices, because the local environment forms an edge local dataset that is often different from the generic dataset. However, training on edge devices with limited computing and memory capability is a challenge problem. In this paper, we propose a novel quantization training framework for efficient compact network training on edge devices. Firstly, training-aware symmetric quantization is introduced to quantize all of the data types in the training process. Then, channel-wise quantization method is adopted for comapact network quantization, which has significantly high tolerance to quantization errors and can make the training process more stable. For further efficient training, we build a hardware evaluation platform to evaluate different settings of the network, so as to achieve a better trade-off among accuracy, energy and latency. Finally, we evaluate two widely used compact networks on a domain adaptation dataset for image classification, and the results demonstrate that the proposed methods can allow us achieve an improvement of 8.4 × -17.2× in energy reduction and 11.9 × -16.3× in latency reduction compared with 32-bit implementations, while maintaining the classification accuracy.

[1] Trevor Darrell,et al. Adapting Visual Category Models to New Domains , 2010, ECCV.

[2] Hanan Samet,et al. Training Quantized Nets: A Deeper Understanding , 2017, NIPS.

[3] Vivienne Sze,et al. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[4] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[6] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[7] Soheil Ghiasi,et al. Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[8] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9] Shuang Wu,et al. Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.

[10] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[11] Leibo Liu,et al. Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.

[13] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, NIPS.

[14] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17] Marian Verhelst,et al. 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[18] Elad Hoffer,et al. Scalable Methods for 8-bit Training of Neural Networks , 2018, NeurIPS.

[19] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.

[20] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[21] Chen Feng,et al. A Quantization-Friendly Separable Convolution for MobileNets , 2018, 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2).