Distilling Pixel-Wise Feature Similarities for Semantic Segmentation

Among the neural network compression techniques, knowledge distillation is an effective one which forces a simpler student network to mimic the output of a larger teacher network. However, most of such model distillation methods focus on the image-level classification task. Directly adapting these methods to the task of semantic segmentation only brings marginal improvements. In this paper, we propose a simple, yet effective knowledge representation referred to as pixel-wise feature similarities (PFS) to tackle the challenging distillation problem of semantic segmentation. The developed PFS encodes spatial structural information for each pixel location of the high-level convolutional features, which helps guide the distillation process in an easier way. Furthermore, a novel weighted pixel-level soft prediction imitation approach is proposed to enable the student network to selectively mimic the teacher network's output, according to their pixel-wise knowledge-gaps. Extensive experiments are conducted on the challenging datasets of Pascal VOC 2012, ADE20K and Pascal Context. Our approach brings significant performance improvements compared to several strong baselines and achieves new state-of-the-art results.

[1]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[2]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[3]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[5]  Xiaogang Wang,et al.  Face Model Compression by Distilling Knowledge from Neurons , 2016, AAAI.

[6]  Eugenio Culurciello,et al.  An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.

[7]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[8]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[10]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[11]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Vishnu Naresh Boddeti,et al.  In Teacher We Trust: Learning Compressed Models for Pedestrian Detection , 2016, ArXiv.

[14]  Junjie Yan,et al.  Mimicking Very Efficient Network for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[16]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[17]  Nicholas Rhinehart,et al.  N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning , 2017, ICLR.

[18]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Matthew Richardson,et al.  Do Deep Convolutional Nets Really Need to be Deep and Convolutional? , 2016, ICLR.

[20]  Chong Wang,et al.  Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification , 2017, 1709.02929.

[21]  Zheng Xu,et al.  Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks , 2017, ICLR.

[22]  Tianqi Chen,et al.  Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.

[23]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[25]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[27]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[30]  Wei-Shi Zheng,et al.  Improving Fast Segmentation With Teacher-Student Learning , 2018, BMVC.

[31]  Naiyan Wang,et al.  Like What You Like: Knowledge Distill via Neuron Selectivity Transfer , 2017, ArXiv.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[34]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).