论文信息 - HAKD: Hardware Aware Knowledge Distillation

HAKD: Hardware Aware Knowledge Distillation

Despite recent developments, deploying deep neural networks on resource constrained general purpose hardware remains a significant challenge. There has been much work in developing methods for reshaping neural networks, usually with a focus on minimising total parameter count. These methods are typically developed in a hardware-agnostic manner and do not exploit hardware behaviour. In this paper we propose a new approach, Hardware Aware Knowledge Distillation (HAKD) which uses empirical observations of hardware behaviour to design efficient student networks which are then trained with knowledge distillation. This allows the trade-off between accuracy and performance to be managed explicitly. We have applied this approach across three platforms and evaluated it on two networks, MobileNet and DenseNet, on CIFAR-10. We show that HAKD outperforms Deep Compression and Fisher pruning in terms of size, accuracy and performance.

[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2] Christopher D. Manning,et al. Fast dropout training , 2013, ICML.

[3] Dmitry P. Vetrov,et al. Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[4] Alexander Heinecke,et al. Anatomy of High-Performance Deep Learning Convolutions on SIMD Architectures , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[5] Timo Aila,et al. Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[6] Scott A. Mahlke,et al. Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[7] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[8] Amos J. Storkey,et al. Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).

[9] Danilo Comminiello,et al. Group sparse regularization for deep neural networks , 2016, Neurocomputing.

[10] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[11] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[12] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Amos J. Storkey,et al. Pruning neural networks: is it time to nip it in the bud? , 2018, ArXiv.

[14] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Bo Chen,et al. NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.

[16] Amos J. Storkey,et al. Moonshine: Distilling with Cheap Convolutions , 2017, NeurIPS.

[17] Lucas Theis,et al. Faster gaze prediction with dense networks and Fisher pruning , 2018, ArXiv.

[18] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.

[19] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[20] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[21] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[22] Nikos Komodakis,et al. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[23] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[24] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[25] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[26] Misha Denil,et al. Predicting Parameters in Deep Learning , 2014 .