Industrial Cyber-Physical Systems-Based Cloud IoT Edge for Federated Heterogeneous Distillation

Deep convoloutional networks have achieved remarkable performance in a wide range of vision-based tasks in modern internet of things (IoT). Due to privacy issue and transmission cost, mannually annotated data for training the deep learning models are usually stored in different sites with fog and edge devices of various computing capacity. It has been proved that knowledge distillation technique can effectively compress well trained neural networks into light-weight models suitable to particular devices. However, different fog and edge devices may perform different sub-tasks, and simplely performing model compression on powerful cloud servers failed to make use of the private data sotred at different sites. To overcome these obstacles, we propose an novel knowledge distillation method for object recognition in real-world IoT sencarios. Our method enables flexible bidirectional online training of heterogeneous models distributed datasets with a new ``brain storming'' mechanism and optimizable temperature parameters. In our comparison experiments, this heterogeneous brain storming method were compared to multiple state-of-the-art single-model compression methods, as well as the newest heterogeneous and homogeneous multi-teacher knowledge distillation methods. Our methods outperformed the state of the arts in both conventional and heterogeneous tasks. Further analysis of the ablation expxeriment results shows that introducing the trainable temperature parameters into the conventional knowledge distillation loss can effectively ease the learning process of student networks in different methods. To the best of our knowledge, this is the IoT-oriented method that allows asynchronous bidirectional heterogeneous knowledge distillation in deep networks.

[1]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[2]  Chun Chen,et al.  Online Knowledge Distillation with Diverse Peers , 2019, AAAI.

[3]  R. Venkatesh Babu,et al.  Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[4]  Kaisheng Ma,et al.  Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Yan Lu,et al.  Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Guang Yang,et al.  SaliencyGAN: Deep Learning Semisupervised Salient Object Detection in the Fog of IoT , 2020, IEEE Transactions on Industrial Informatics.

[7]  Jin Young Choi,et al.  Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons , 2018, AAAI.

[8]  Qi Tian,et al.  Data-Free Learning of Student Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  U Kang,et al.  Knowledge Extraction with No Observable Data , 2019, NeurIPS.

[10]  Yonglong Tian,et al.  Contrastive Representation Distillation , 2019, ICLR.

[11]  Guocong Song,et al.  Collaborative Learning for Deep Neural Networks , 2018, NeurIPS.

[12]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[13]  Li Sun,et al.  Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Phongtharin Vinayavekhin,et al.  Unifying Heterogeneous Classifiers With Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Derek Hoiem,et al.  Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Kartikeya Bhardwaj,et al.  Dream Distillation: A Data-Independent Model Compression Framework , 2019, ArXiv.

[17]  Xu Lan,et al.  Knowledge Distillation by On-the-Fly Native Ensemble , 2018, NeurIPS.

[18]  Mingli Song,et al.  Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning , 2019, IJCAI.

[19]  Jangho Kim,et al.  Paraphrasing Complex Network: Network Compression via Factor Transfer , 2018, NeurIPS.

[20]  Greg Mori,et al.  Similarity-Preserving Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Amos Storkey,et al.  Zero-shot Knowledge Transfer via Adversarial Belief Matching , 2019, NeurIPS.

[22]  Xinchao Wang,et al.  Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Nicholas Ayache,et al.  Fast and Simple Calculus on Tensors in the Log-Euclidean Framework , 2005, MICCAI.

[24]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Neil D. Lawrence,et al.  Variational Information Distillation for Knowledge Transfer , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yu Liu,et al.  Correlation Congruence for Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Quoc V. Le,et al.  BAM! Born-Again Multi-Task Networks for Natural Language Understanding , 2019, ACL.

[29]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[30]  Thad Starner,et al.  Data-Free Knowledge Distillation for Deep Neural Networks , 2017, ArXiv.

[31]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[32]  Wei-Shi Zheng,et al.  Distilled Person Re-Identification: Towards a More Scalable System , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  R. Venkatesh Babu,et al.  Zero-Shot Knowledge Distillation in Deep Networks , 2019, ICML.

[34]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[35]  Rich Caruana,et al.  Model compression , 2006, KDD '06.