Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Deep neural models in recent years have been successful in almost every field, including extremely complex problem statements. However, these models are huge in size, with millions (and even billions) of parameters, thus demanding more heavy computation power and failing to be deployed on edge devices. Besides, the performance boost is highly dependent on redundant labeled data. To achieve faster speeds and to handle the problems caused by the lack of data, knowledge distillation (KD) has been proposed to transfer information learned from one model to another. KD is often characterized by the so-called `Student-Teacher' (S-T) learning framework and has been broadly applied in model compression and knowledge transfer. This paper is about KD and S-T learning, which are being actively studied in recent years. First, we aim to provide explanations of what KD is and how/why it works. Then, we provide a comprehensive survey on the recent progress of KD methods together with S-T frameworks typically for vision tasks. In general, we consider some fundamental questions that have been driving this research area and thoroughly generalize the research progress and technical details. Additionally, we systematically analyze the research status of KD in vision applications. Finally, we discuss the potentials and open challenges of existing methods and prospect the future directions of KD and S-T learning.

[1]  Jiashi Feng,et al.  Distilling Object Detectors With Fine-Grained Feature Imitation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Furu Wei,et al.  MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers , 2020, NeurIPS.

[3]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[4]  Elad Hoffer,et al.  The Knowledge Within: Methods for Data-Free Model Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Yasin Almalioglu,et al.  Distilling Knowledge From a Deep Pose Regressor Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Zhijian Liu,et al.  GAN Compression: Efficient Architectures for Interactive Conditional GANs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[9]  Liang Gao,et al.  Multistructure-Based Collaborative Online Distillation , 2019, Entropy.

[10]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[11]  Mehdi Rezagholizadeh,et al.  TextKD-GAN: Text Generation Using Knowledge Distillation and Generative Adversarial Networks , 2019, Canadian Conference on AI.

[12]  D. Tao,et al.  Distillating Knowledge from Graph Convolutional Networks , 2020 .

[13]  Andrew Zisserman,et al.  Learnable PINs: Cross-Modal Embeddings for Person Identity , 2018, ECCV.

[14]  Jangho Kim,et al.  Paraphrasing Complex Network: Network Compression via Factor Transfer , 2018, NeurIPS.

[15]  Lothar Thiele,et al.  Multi-Task Zipping via Layer-wise Neuron Sharing , 2018, NeurIPS.

[16]  Xinchao Wang,et al.  Data-Free Adversarial Distillation , 2019, ArXiv.

[17]  Xiaogang Wang,et al.  Learning Monocular Depth by Distilling Cross-domain Stereo Networks , 2018, ECCV.

[18]  Eric Eaton,et al.  Autonomous Cross-Domain Knowledge Transfer in Lifelong Policy Gradient Reinforcement Learning , 2015, IJCAI.

[19]  Srinivas S. Kruthiventi,et al.  Low-light pedestrian detection from RGB images using multi-modal knowledge distillation , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[20]  Yafei Dai,et al.  MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks , 2019, ArXiv.

[21]  Stagewise Knowledge Distillation , 2019, ArXiv.

[22]  Yi Yang,et al.  You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Bo Zhang,et al.  Smooth Neighbors on Teacher Graphs for Semi-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Hye-jin Shim,et al.  Distilling the Knowledge of Specialist Deep Neural Networks in Acoustic Scene Classification , 2019 .

[25]  Xiaolin Hu,et al.  Knowledge Distillation via Route Constrained Optimization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Distilled Hierarchical Neural Ensembles with Adaptive Inference Cost , 2020, ArXiv.

[27]  Vittorio Murino,et al.  Audio-Visual Model Distillation Using Acoustic Images , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[28]  Kuk-Jin Yoon,et al.  Deceiving Image-to-Image Translation Networks for Autonomous Driving With Adversarial Perturbations , 2020, IEEE Robotics and Automation Letters.

[29]  Yi Yang,et al.  Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Pietro Zanuttigh,et al.  Knowledge Distillation for Incremental Learning in Semantic Segmentation , 2021, Comput. Vis. Image Underst..

[31]  Amos Storkey,et al.  Zero-shot Knowledge Transfer via Adversarial Belief Matching , 2019, NeurIPS.

[32]  Zhiqiang Shen,et al.  Adversarial-Based Knowledge Distillation for Multi-Model Ensemble and Noisy Data Refinement , 2019, ArXiv.

[33]  Jinke Yu,et al.  GAN-Knowledge Distillation for One-Stage Object Detection , 2019, IEEE Access.

[34]  Stefano Mattoccia,et al.  Learning End-To-End Scene Flow by Distilling Single Tasks Knowledge , 2019, AAAI.

[35]  Jitendra Malik,et al.  Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[37]  Yu Cheng,et al.  Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.

[38]  D. Tao,et al.  Distilling Knowledge From Graph Convolutional Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Bo Du,et al.  Self-Ensembling Attention Networks: Addressing Domain Shift for Semantic Segmentation , 2019, AAAI.

[40]  Naiyan Wang,et al.  Like What You Like: Knowledge Distill via Neuron Selectivity Transfer , 2017, ArXiv.

[41]  Jianhuang Lai,et al.  Progressive Teacher-Student Learning for Early Action Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Larry S. Davis,et al.  M2KD: Multi-model and Multi-level Knowledge Distillation for Incremental Learning , 2019, ArXiv.

[43]  Dacheng Tao,et al.  Adversarial Learning of Portable Student Networks , 2018, AAAI.

[44]  Seunghyun Lee,et al.  Graph-based Knowledge Distillation by Multi-head Self-attention Network , 2019 .

[45]  Jin Young Choi,et al.  Knowledge Distillation with Adversarial Samples Supporting Decision Boundary , 2018, AAAI.

[46]  Trevor Darrell,et al.  Cross-modal adaptation for RGB-D detection , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Antonio Torralba,et al.  See, Hear, and Read: Deep Aligned Representations , 2017, ArXiv.

[48]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[49]  Jian Peng,et al.  Knowledge Flow: Improve Upon Your Teachers , 2019, ICLR.

[50]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[51]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Qi Tian,et al.  An End-to-End Architecture for Class-Incremental Object Detection with Knowledge Distillation , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[53]  Lin Wang,et al.  EventSR: From Asynchronous Events to Image Reconstruction, Restoration, and Super-Resolution via End-to-End Adversarial Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Neil D. Lawrence,et al.  Transferring Knowledge across Learning Processes , 2018, ICLR.

[55]  Asit K. Mishra,et al.  Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy , 2017, ICLR.

[56]  Nicu Sebe,et al.  Refine and Distill: Exploiting Cycle-Inconsistency and Knowledge Distillation for Unsupervised Monocular Depth Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[58]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[59]  Sung Ju Hwang,et al.  Self-supervised Label Augmentation via Input Transformations , 2019, ICML.

[60]  Wei Zhang,et al.  Learning Efficient Detector with Semi-supervised Adaptive Distillation , 2019, BMVC.

[61]  Bo Zhang,et al.  Pairwise Teacher-Student Network for Semi-Supervised Hashing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[62]  Paolo Favaro,et al.  Boosting Self-Supervised Learning via Knowledge Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63]  Stan Z. Li,et al.  Learning Lightweight Face Detector with Knowledge Distillation , 2019, 2019 International Conference on Biometrics (ICB).

[64]  Tao Mei,et al.  KTAN: Knowledge Transfer Adversarial Network , 2018, 2020 International Joint Conference on Neural Networks (IJCNN).

[65]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[66]  Charles X. Ling,et al.  Fast Generalized Distillation for Semi-Supervised Domain Adaptation , 2017, AAAI.

[67]  Quoc V. Le,et al.  BAM! Born-Again Multi-Task Networks for Natural Language Understanding , 2019, ACL.

[68]  Yashesh Gaur,et al.  Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[69]  Jang Hyun Cho,et al.  On the Efficacy of Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[70]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[71]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[72]  Minsik Lee,et al.  Building a Compact Convolutional Neural Network for Embedded Intelligent Sensor Systems Using Group Sparsity and Knowledge Distillation , 2019, Sensors.

[73]  Xinchao Wang,et al.  Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Dacheng Tao,et al.  Positive-Unlabeled Compression on the Cloud , 2019, NeurIPS.

[75]  Mingli Song,et al.  Amalgamating Filtered Knowledge: Learning Task-customized Student from Multi-task Teachers , 2019, IJCAI.

[76]  Nitesh V. Chawla,et al.  Graph Few-shot Learning via Knowledge Transfer , 2020, AAAI.

[77]  Mao Ye,et al.  Fast Human Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Marc Van Droogenbroeck,et al.  ARTHuS: Adaptive Real-Time Human Segmentation in Sports Through Online Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[79]  Li Chen,et al.  A New Knowledge Distillation for Incremental Object Detection , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[80]  Hongyuan Zha,et al.  Heterogeneous Graph-based Knowledge Transfer for Generalized Zero-shot Learning , 2019, ArXiv.

[81]  Joon Son Chung,et al.  ASR is All You Need: Cross-Modal Distillation for Lip Reading , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[82]  Xu Liu,et al.  DualNet: Learn Complementary Features for Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[83]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Fabio Galasso,et al.  Adversarial Network Compression , 2018, ECCV Workshops.

[85]  Juan Carlos Niebles,et al.  Graph Distillation for Action Detection with Privileged Modalities , 2017, ECCV.

[86]  Stefano Mattoccia,et al.  Distilled Semantics for Comprehensive Scene Understanding from Videos , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  Zhang-Wei Hong,et al.  Periodic Intra-Ensemble Knowledge Distillation for Reinforcement Learning , 2020, ArXiv.

[88]  Lester W. Mackey,et al.  Teacher-Student Compression with Generative Adversarial Networks , 2018, 1812.02271.

[89]  Kuk-Jin Yoon,et al.  Learning to Reconstruct HDR Images from Events, with Applications to Depth and Flow Prediction , 2021, International Journal of Computer Vision.

[91]  Jean-Marc Odobez,et al.  Efficient Convolutional Neural Networks for Depth-Based Multi-Person Pose Estimation , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[92]  Chao Xu,et al.  Distilling portable Generative Adversarial Networks for Image Translation , 2020, AAAI.

[93]  Qing Liu,et al.  Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[94]  John G. Breslin,et al.  Knowledge Adaptation: Teaching to Adapt , 2017, ArXiv.

[95]  Bing Li,et al.  Object Relational Graph With Teacher-Recommended Learning for Video Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[96]  Chong-Wah Ngo,et al.  Exploring Object Relation in Mean Teacher for Cross-Domain Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[97]  Jiashi Feng,et al.  Dynamic Kernel Distillation for Efficient Pose Estimation in Videos , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[98]  Fahad Shahbaz Khan,et al.  MineGAN: Effective Knowledge Transfer From GANs to Target Domains With Few Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[99]  Veronica Teichrieb,et al.  Squeezed Deep 6DoF Object Detection using Knowledge Distillation , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[100]  Amos J. Storkey,et al.  Moonshine: Distilling with Cheap Convolutions , 2017, NeurIPS.

[101]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[102]  Ke Chen,et al.  Structured Knowledge Distillation for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[103]  Yu Liu,et al.  Correlation Congruence for Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[104]  Yan Lu,et al.  Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[105]  Chen-Kuo Chiang,et al.  Layer-Level Knowledge Distillation for Deep Neural Network Learning , 2019, Applied Sciences.

[106]  Dacheng Tao,et al.  Learning from Multiple Teacher Networks , 2017, KDD.

[107]  Karttikeya Mangalam,et al.  On Compressing U-net Using Knowledge Distillation , 2018, ArXiv.

[108]  Seyed Iman Mirzadeh,et al.  Improved Knowledge Distillation via Teacher Assistant , 2020, AAAI.

[109]  Shu-Tao Xia,et al.  Adaptive Regularization of Labels , 2019, ArXiv.

[110]  Yuhu Shan Distilling Pixel-Wise Feature Similarities for Semantic Segmentation , 2019, ArXiv.

[111]  Shu Wang,et al.  Collaborative Deep Reinforcement Learning , 2017, ArXiv.

[112]  Guocong Song,et al.  Collaborative Learning for Deep Neural Networks , 2018, NeurIPS.

[113]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[114]  Ming-Hsuan Yang,et al.  CrDoCo: Pixel-Level Domain Transfer With Cross-Domain Consistency , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[115]  Thad Starner,et al.  Data-Free Knowledge Distillation for Deep Neural Networks , 2017, ArXiv.

[116]  Anastasios Tefas,et al.  Learning Deep Representations with Probabilistic Knowledge Transfer , 2018, ECCV.

[117]  Chun Chen,et al.  Online Knowledge Distillation with Diverse Peers , 2019, AAAI.

[118]  Chen Change Loy,et al.  Residual Knowledge Distillation , 2020, ArXiv.

[119]  Michael R. Lyu,et al.  DDFlow: Learning Optical Flow with Unlabeled Data Distillation , 2019, AAAI.

[120]  Heeyoul Choi,et al.  Self-Knowledge Distillation in Natural Language Processing , 2019, RANLP.

[121]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[122]  Luc Van Gool,et al.  ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[123]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[124]  Cordelia Schmid,et al.  Diversity With Cooperation: Ensemble Methods for Few-Shot Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[125]  Yuxin Peng,et al.  Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification , 2018, IJCAI.

[126]  Jiashi Feng,et al.  Revisit Knowledge Distillation: a Teacher-free Framework , 2019, ArXiv.

[127]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[128]  Neil D. Lawrence,et al.  Variational Information Distillation for Knowledge Transfer , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[129]  Byonghyo Shim,et al.  Stochasticity and Skip Connection Improve Knowledge Transfer , 2019, 2020 28th European Signal Processing Conference (EUSIPCO).

[130]  Kuk-Jin Yoon,et al.  SpherePHD: Applying CNNs on a Spherical PolyHeDron Representation of 360° Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[131]  Changshui Zhang,et al.  Few Sample Knowledge Distillation for Efficient Network Compression , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[132]  Xing Fan,et al.  Knowledge Distillation from Internal Representations , 2020, AAAI.

[133]  Sebastian Nowozin,et al.  Hydra: Preserving Ensemble Diversity for Model Distillation , 2020, ArXiv.

[134]  Zhiqiang Shen,et al.  MEAL: Multi-Model Ensemble via Adversarial Learning , 2018, AAAI.

[135]  Mohammad Farhadi,et al.  TKD: Temporal Knowledge Distillation for Active Perception , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[136]  Manfred K. Warmuth,et al.  The limits of squared Euclidean distance regularization , 2014, NIPS.

[137]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[138]  Chen Change Loy,et al.  Knowledge Distillation Meets Self-Supervision , 2020, ECCV.

[139]  Qiaozhu Mei,et al.  Graph Representation Learning via Multi-task Knowledge Distillation , 2019, ArXiv.

[140]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[141]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[142]  Jure Leskovec,et al.  Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[143]  Deva Ramanan,et al.  Online Model Distillation for Efficient Video Inference , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[144]  Ramakant Nevatia,et al.  Knowledge Concentration: Learning 100K Object Classifiers in a Single CNN , 2017, ArXiv.

[145]  Andrew Owens,et al.  Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.

[146]  Wei-Shi Zheng,et al.  Distilled Person Re-Identification: Towards a More Scalable System , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[147]  Akshay Kulkarni,et al.  Data Efficient Stagewise Knowledge Distillation , 2019 .

[148]  Jian Yang,et al.  Teaching Semi-Supervised Classifier via Generalized Distillation , 2018, IJCAI.

[149]  Zheng Xu,et al.  Training Student Networks for Acceleration with Conditional Adversarial Networks , 2018, BMVC.

[150]  Xueming Qian,et al.  Preparing Lessons: Improve Knowledge Distillation with Better Supervision , 2019, Neurocomputing.

[151]  Yi Tian,et al.  Integral Knowledge Distillation for Multi-Person Pose Estimation , 2020, IEEE Signal Processing Letters.

[152]  Dahua Lin,et al.  Learning a Unified Classifier Incrementally via Rebalancing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[153]  Ben Glocker,et al.  Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images , 2018, Medical Image Anal..

[154]  Lizhuang Ma,et al.  Knowledge Squeezed Adversarial Network Compression , 2019, ArXiv.

[155]  Di He,et al.  Multilingual Neural Machine Translation with Knowledge Distillation , 2019, ICLR.

[156]  Byung Cheol Song,et al.  Self-supervised Knowledge Distillation Using Singular Value Decomposition , 2018, ECCV.

[157]  Lorenzo Torresani,et al.  Network of Experts for Large-Scale Image Categorization , 2016, ECCV.

[158]  Micah Goldblum,et al.  Adversarially Robust Distillation , 2019, AAAI.

[159]  Pheng Ann Heng,et al.  Unpaired Multi-Modal Segmentation via Knowledge Distillation , 2020, IEEE Transactions on Medical Imaging.

[160]  Alexander Mordvintsev,et al.  Inceptionism: Going Deeper into Neural Networks , 2015 .

[161]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[162]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[163]  Jangho Kim,et al.  Feature Fusion for Online Mutual Knowledge Distillation , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).

[164]  Changming Sun,et al.  Knowledge Adaptation for Efficient Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[165]  Li Sun,et al.  Amalgamating Knowledge towards Comprehensive Classification , 2018, AAAI.

[166]  Xiaodong Liu,et al.  Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding , 2019, ArXiv.

[167]  Chuang Gan,et al.  Self-Supervised Moving Vehicle Tracking With Stereo Sound , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[168]  Hossein Mobahi,et al.  Self-Distillation Amplifies Regularization in Hilbert Space , 2020, NeurIPS.

[169]  Yulun Zhang,et al.  Attention Bridging Network for Knowledge Transfer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[170]  Jun Zhu,et al.  Triple Generative Adversarial Nets , 2017, NIPS.

[171]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[172]  Junseok Kwon,et al.  Sphere Generative Adversarial Network Based on Geometric Moment Matching , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[173]  Eric Granger,et al.  A Cross-Modal Distillation Network for Person Re-identification in RGB-Depth , 2018, ArXiv.

[174]  Greg Mori,et al.  Similarity-Preserving Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[175]  Elahe Arani,et al.  Noisy Collaboration in Knowledge Distillation , 2019 .

[176]  Connor Greenwell,et al.  Learning to Map Nearly Anything , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[177]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[178]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[179]  Pierre Vandergheynst,et al.  Graph Signal Processing: Overview, Challenges, and Applications , 2017, Proceedings of the IEEE.

[180]  Bing Li,et al.  Knowledge Distillation via Instance Relationship Graph , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[181]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[182]  Byung Cheol Song,et al.  Graph-based Knowledge Distillation by Multi-head Self-attention Network , 2019 .

[183]  Andrew Zisserman,et al.  Emotion Recognition in Speech using Cross-Modal Transfer in the Wild , 2018, ACM Multimedia.

[184]  Yuxing Peng,et al.  An Adversarial Feature Distillation Method for Audio Classification , 2019, IEEE Access.

[185]  Wenguan Wang,et al.  Teacher-Students Knowledge Distillation for Siamese Trackers , 2019, ArXiv.

[186]  R. Venkatesh Babu,et al.  Zero-Shot Knowledge Distillation in Deep Networks , 2019, ICML.

[187]  Quanshi Zhang,et al.  Explaining Knowledge Distillation by Quantifying the Knowledge , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[188]  Qi Tian,et al.  Data-Free Learning of Student Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[189]  François Fleuret,et al.  Knowledge Transfer with Jacobian Matching , 2018, ICML.

[190]  Sangdoo Yun,et al.  A Comprehensive Overhaul of Feature Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[191]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[192]  U Kang,et al.  Knowledge Extraction with No Observable Data , 2019, NeurIPS.

[193]  Alan L. Yuille,et al.  Snapshot Distillation: Teacher-Student Optimization in One Generation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[194]  Heng Yang,et al.  Training a Binary Weight Object Detector by Knowledge Transfer for Autonomous Driving , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[195]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[196]  Wenjun Zeng,et al.  Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based Object Re-Identification , 2020, AAAI.

[197]  Antonio Torralba,et al.  SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[198]  Mingli Song,et al.  Student Becoming the Master: Knowledge Amalgamation for Joint Scene Parsing, Depth Estimation, and More , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[199]  Juergen Gall,et al.  Cross-Modal Knowledge Distillation for Action Recognition , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[200]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[201]  Liyi Dai,et al.  Cross-Modality Distillation: A Case for Conditional Generative Adversarial Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[202]  Dimitris N. Metaxas,et al.  Knowledge As Priors: Cross-Modal Knowledge Generalization for Datasets Without Superior Knowledge , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[203]  Subhransu Maji,et al.  Adapting Models to Signal Degradation using Distillation , 2017, BMVC.

[204]  Yueting Zhuang,et al.  Relational Knowledge Transfer for Zero-Shot Learning , 2016, AAAI.

[205]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[206]  T. Stein International Geoscience And Remote Sensing Symposium , 1992, [Proceedings] IGARSS '92 International Geoscience and Remote Sensing Symposium.

[207]  Thanh-Toan Do,et al.  Compact Trilinear Interaction for Visual Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[208]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[209]  Nicolas Monet,et al.  Lightweight 3D Human Pose Estimation Network Training Using Teacher-Student Learning , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[210]  Rynson W. H. Lau,et al.  Dual Student: Breaking the Limits of the Teacher in Semi-Supervised Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[211]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[212]  Jun Zhu,et al.  Cluster Alignment With a Teacher for Unsupervised Domain Adaptation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[213]  Bernhard Schölkopf,et al.  Unifying distillation and privileged information , 2015, ICLR.

[214]  Byung Cheol Song,et al.  Graph-based Knowledge Distillation by Multi-head Attention Network , 2019, BMVC.

[215]  Huchuan Lu,et al.  Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[216]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[217]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[218]  Ching-Te Chiu,et al.  Multi-teacher knowledge distillation for compressed video action recognition based on deep learning , 2020, J. Syst. Archit..

[219]  Leonidas J. Guibas,et al.  Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[220]  Xu Lan,et al.  Knowledge Distillation by On-the-Fly Native Ensemble , 2018, NeurIPS.

[221]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[222]  Kaiming He,et al.  Data Distillation: Towards Omni-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[223]  Pouya Bashivan,et al.  Teacher Guided Architecture Search , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[224]  Olac Fuentes,et al.  Knowledge Transfer in Deep convolutional Neural Nets , 2007, Int. J. Artif. Intell. Tools.

[225]  Vincent Gripon,et al.  Deep Geometric Knowledge Distillation with Graphs , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[226]  Zheng Xu,et al.  Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks , 2017, ICLR.

[227]  Jayashree Karlekar,et al.  Deep Face Recognition Model Compression via Knowledge Transfer and Distillation , 2019, ArXiv.

[228]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[229]  Cheng-Lin Liu,et al.  Data-Distortion Guided Self-Distillation for Deep Neural Networks , 2019, AAAI.

[230]  Guy Van den Broeck,et al.  LaTeS: Latent Space Distillation for Teacher-Student Driving Policy Learning , 2019, ArXiv.

[231]  Geoffrey French,et al.  Self-ensembling for visual domain adaptation , 2017, ICLR.

[232]  Yandong Guo,et al.  Large Scale Incremental Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[233]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[234]  Yonglong Tian,et al.  Contrastive Representation Distillation , 2019, ICLR.

[235]  Sung Ju Hwang,et al.  Rethinking Data Augmentation: Self-Supervision and Self-Distillation , 2019, ArXiv.

[236]  Srinidhi Hegde,et al.  Variational Student: Learning Compact and Sparser Networks In Knowledge Distillation Framework , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[237]  Megha Nawhal,et al.  Lifelong GAN: Continual Learning for Conditional Image Generation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[238]  Wei-Shi Zheng,et al.  Improving Fast Segmentation With Teacher-Student Learning , 2018, BMVC.

[239]  Ming-Hsuan Yang,et al.  Learning to Adapt Structured Output Space for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[240]  Simon Lucey,et al.  Distill Knowledge From NRSfM for Weakly Supervised 3D Pose Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[241]  Changick Kim,et al.  Self-Ensembling With GAN-Based Data Augmentation for Domain Adaptation in Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[242]  Juan Carlos Niebles,et al.  Spatio-Temporal Graph for Video Captioning With Knowledge Distillation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[243]  Kwanghoon Sohn,et al.  A Large RGB-D Dataset for Semi-supervised Monocular Depth Estimation , 2019, ArXiv.

[244]  Xue-wen Chen,et al.  Teacher/Student Deep Semi-Supervised Learning for Training with Noisy Labels , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[245]  Mitesh M. Khapra,et al.  Efficient Video Classification Using Fewer Frames , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[246]  Cordelia Schmid,et al.  End-to-End Incremental Learning , 2018, ECCV.

[247]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[248]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[249]  Nicholas Rhinehart,et al.  N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning , 2017, ICLR.

[250]  Geoffrey E. Hinton,et al.  Large scale distributed neural network training through online distillation , 2018, ICLR.

[251]  Koh Takeuchi,et al.  Few-shot learning of neural networks from scratch by pseudo example optimization , 2018, BMVC.

[252]  Li Sun,et al.  Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[253]  Derek Hoiem,et al.  Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[254]  Vineeth N. Balasubramanian,et al.  Deep Model Compression: Distilling Knowledge from Noisy Teachers , 2016, ArXiv.

[255]  Kaigui Bian,et al.  Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model Distillation Approach , 2020, ArXiv.

[256]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[257]  Shiming Ge,et al.  Low-Resolution Face Recognition in the Wild via Selective Knowledge Distillation , 2018, IEEE Transactions on Image Processing.

[258]  Mitesh M. Khapra,et al.  On Knowledge distillation from complex networks for response prediction , 2019, NAACL.

[259]  Xiaogang Wang,et al.  Face Model Compression by Distilling Knowledge from Neurons , 2016, AAAI.

[260]  Cordelia Schmid,et al.  Incremental Learning of Object Detectors without Catastrophic Forgetting , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[261]  Hassan Ghasemzadeh,et al.  Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher , 2019, ArXiv.

[262]  Soheil Feizi,et al.  Compressing GANs using Knowledge Distillation , 2019, ArXiv.

[263]  Andrey Malinin,et al.  Ensemble Distribution Distillation , 2019, ICLR.

[264]  Joost van de Weijer,et al.  Learning Metrics From Teachers: Compact Networks for Image Embedding , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[265]  Inés María Galván,et al.  A Selective Learning Method to Improve the Generalization of Multilayer Feedforward Neural Networks , 2001, Int. J. Neural Syst..

[266]  Antonio Torralba,et al.  Through-Wall Human Pose Estimation Using Radio Signals , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[267]  Kaisheng Ma,et al.  Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[268]  Mandar Kulkarni,et al.  Knowledge distillation using unlabeled mismatched images , 2017, ArXiv.

[269]  Yafei Song,et al.  Ultrafast Video Attention Prediction with Coupled Knowledge Distillation , 2019, AAAI.

[270]  Jian Liu,et al.  Exploiting the Ground-Truth: An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection , 2019, AAAI.

[271]  Nojun Kwak,et al.  Feature-map-level Online Adversarial Knowledge Distillation , 2020, ICML.

[272]  Stefano Mattoccia,et al.  Learning Monocular Depth Estimation Infusing Traditional Stereo Knowledge , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[273]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[274]  Andrew Zisserman,et al.  Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[275]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[276]  Nojun Kwak,et al.  FEED: Feature-level Ensemble for Knowledge Distillation , 2019, ArXiv.

[277]  Rui Zhang,et al.  KDGAN: Knowledge Distillation with Generative Adversarial Networks , 2018, NeurIPS.

[278]  Bhuvana Ramabhadran,et al.  Efficient Knowledge Distillation from an Ensemble of Teachers , 2017, INTERSPEECH.

[279]  Kartikeya Bhardwaj,et al.  Dream Distillation: A Data-Independent Model Compression Framework , 2019, ArXiv.

[280]  Ming Gong,et al.  Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System , 2019, WSDM.

[281]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[282]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[283]  Matthew Crosby,et al.  Association for the Advancement of Artificial Intelligence , 2014 .

[284]  Irwin King,et al.  Few Shot Network Compression via Cross Distillation , 2020, AAAI.

[285]  Mingli Song,et al.  Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning , 2019, IJCAI.

[286]  Suk-Ju Kang,et al.  Teaching Where to See: Knowledge Distillation-Based Attentive Information Transfer in Vehicle Maker Classification , 2019, IEEE Access.

[287]  Ming-Hsuan Yang,et al.  Collaborative Distillation for Ultra-Resolution Universal Style Transfer , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[288]  Christoph H. Lampert,et al.  Distillation-Based Training for Multi-Exit Architectures , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[289]  Hironobu Fujiyoshi,et al.  Knowledge Transfer Graph for Deep Collaborative Learning , 2019, ArXiv.

[290]  Guiguang Ding,et al.  Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification , 2020, ECCV.

[291]  Xiangyang Xue,et al.  Regional Gating Neural Networks for Multi-label Image Classification , 2016, BMVC.

[292]  Phongtharin Vinayavekhin,et al.  Unifying Heterogeneous Classifiers With Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[293]  Tae-Hyun Oh,et al.  On Learning Associations of Faces and Voices , 2018, ACCV.

[294]  Huan Wang,et al.  Triplet Distillation For Deep Face Recognition , 2019, 2020 IEEE International Conference on Image Processing (ICIP).

[295]  Long Chen,et al.  Learning Lightweight Pedestrian Detector with Hierarchical Knowledge Distillation , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[296]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[297]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[298]  Andrew Zisserman,et al.  Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[299]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[300]  Mehak Mehak,et al.  Knowledge Distillation from MultipleTeachers using Visual Explanations , 2018 .

[301]  Kuk-Jin Yoon,et al.  SpherePHD: Applying CNNs on a Spherical PolyHeDron Representation of 360 degree Images , 2018, ArXiv.

[302]  Yo-Sung Ho,et al.  Event-Based High Dynamic Range Image and Very High Frame Rate Video Generation Using Conditional Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[303]  Xu Lan,et al.  Self-Referenced Deep Learning , 2018, ACCV.

[304]  Jianping Fan,et al.  MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization , 2019, ArXiv.

[305]  Cheng Li,et al.  DeepGraph: Graph Structure Predicts Network Growth , 2016, ArXiv.

[306]  Alan L. Yuille,et al.  Training Deep Neural Networks in Generations: A More Tolerant Teacher Educates Better Students , 2018, AAAI.

[307]  Zhi Zhang,et al.  Knowledge Projection for Deep Neural Networks , 2017, ArXiv.

[308]  Huan Wang,et al.  MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models , 2019 .

[309]  Jin Young Choi,et al.  Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons , 2018, AAAI.