QUEST: Quantized embedding space for transferring knowledge

Knowledge distillation refers to the process of training a compact student network to achieve better accuracy by learning from a high capacity teacher network. Most of the existing knowledge distillation methods direct the student to follow the teacher by matching the teacher's output, feature maps or their distribution. In this work, we propose a novel way to achieve this goal: by distilling the knowledge through a quantized space. According to our method, the teacher's feature maps are quantized to represent the main visual concepts encompassed in the feature maps. The student is then asked to predict the quantized representation, which thus forms the task that the student uses to learn from the teacher. Despite its simplicity, we show that our approach is able to yield results that improve the state of the art on knowledge distillation. To that end, we provide an extensive evaluation across several network architectures and most commonly used benchmark datasets.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[3]  Jang Hyun Cho,et al.  On the Efficacy of Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[5]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jangho Kim,et al.  Paraphrasing Complex Network: Network Compression via Factor Transfer , 2018, NeurIPS.

[7]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jiashi Feng,et al.  Revisit Knowledge Distillation: a Teacher-free Framework , 2019, ArXiv.

[11]  Neil D. Lawrence,et al.  Variational Information Distillation for Knowledge Transfer , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yannis Avrithis,et al.  To Aggregate or Not to aggregate: Selective Match Kernels for Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[14]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[15]  Alan L. Yuille,et al.  Training Deep Neural Networks in Generations: A More Tolerant Teacher Educates Better Students , 2018, AAAI.

[16]  Alan L. Yuille,et al.  Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students , 2018, ArXiv.

[17]  Andrew Zisserman,et al.  Video Google: Efficient Visual Search of Videos , 2006, Toward Category-Level Object Recognition.

[18]  Xu Lan,et al.  Knowledge Distillation by On-the-Fly Native Ensemble , 2018, NeurIPS.

[19]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[20]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[21]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[22]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[23]  Yu Liu,et al.  Correlation Congruence for Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[25]  Jin Young Choi,et al.  Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons , 2018, AAAI.

[26]  Abhinav Gupta,et al.  ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Anastasios Tefas,et al.  Learning Deep Representations with Probabilistic Knowledge Transfer , 2018, ECCV.

[28]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[31]  Yonglong Tian,et al.  Contrastive Representation Distillation , 2019, ICLR.

[32]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[33]  Bing Li,et al.  Knowledge Distillation via Instance Relationship Graph , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[35]  Matthieu Cord,et al.  Learning Representations by Predicting Bags of Visual Words , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Greg Mori,et al.  Similarity-Preserving Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Naiyan Wang,et al.  Like What You Like: Knowledge Distill via Neuron Selectivity Transfer , 2017, ArXiv.

[38]  Yan Lu,et al.  Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[40]  Sangdoo Yun,et al.  A Comprehensive Overhaul of Feature Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[42]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.