Knowledge Distillation for Semantic Segmentation Using Channel and Spatial Correlations and Adaptive Cross Entropy

In this paper, we propose an efficient knowledge distillation method to train light networks using heavy networks for semantic segmentation. Most semantic segmentation networks that exhibit good accuracy are based on computationally expensive networks. These networks are not suitable for mobile applications using vision sensors, because computational resources are limited in these environments. In this view, knowledge distillation, which transfers knowledge from heavy networks acting as teachers to light networks as students, is suitable methodology. Although previous knowledge distillation approaches have been proven to improve the performance of student networks, most methods have some limitations. First, they tend to use only the spatial correlation of feature maps and ignore the relational information of their channels. Second, they can transfer false knowledge when the results of the teacher networks are not perfect. To address these two problems, we propose two loss functions: a channel and spatial correlation (CSC) loss function and an adaptive cross entropy (ACE) loss function. The former computes the full relationship of both the channel and spatial information in the feature map, and the latter adaptively exploits one-hot encodings using the ground truth labels and the probability maps predicted by the teacher network. To evaluate our method, we conduct experiments on scene parsing datasets: Cityscapes and Camvid. Our method presents significantly better performance than previous methods.

[1]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[2]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[3]  Rui Zhang,et al.  KDGAN: Knowledge Distillation with Generative Adversarial Networks , 2018, NeurIPS.

[4]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Gang Yu,et al.  BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[7]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[8]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[9]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[10]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[11]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  S. Dwivedi,et al.  Obesity May Be Bad: Compressed Convolutional Networks for Biomedical Image Segmentation , 2020 .

[13]  Linda G. Shapiro,et al.  ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation , 2018, ECCV.

[14]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ke Chen,et al.  Structured Knowledge Distillation for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jingdong Wang,et al.  OCNet: Object Context Network for Scene Parsing , 2018, ArXiv.

[17]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[18]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Eduardo Romera,et al.  ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[23]  Jae Myung Kim,et al.  Efficient Semantic Segmentation Using Spatio-Channel Dilated Convolutions , 2019, IEEE Access.

[24]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[25]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Martin Jägersand,et al.  Deep semantic segmentation for automated driving: Taxonomy, roadmap and challenges , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[28]  Chih-Yang Lin,et al.  Global-and-Local Context Network for Semantic Segmentation of Street View Images , 2020, Sensors.

[29]  François Fleuret,et al.  Knowledge Transfer with Jacobian Matching , 2018, ICML.

[30]  Charless C. Fowlkes,et al.  Laplacian Reconstruction and Refinement for Semantic Segmentation , 2016, ArXiv.

[31]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[32]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[33]  Wei-Shi Zheng,et al.  Improving Fast Segmentation With Teacher-Student Learning , 2018, BMVC.

[34]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Christopher Zach,et al.  ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time , 2018, BMVC.

[36]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[39]  Roberto Cipolla,et al.  Fast-SCNN: Fast Semantic Segmentation Network , 2019, BMVC.

[40]  Xiaojuan Qi,et al.  ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[41]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Haibo Wang,et al.  Efficient Fast Semantic Segmentation Using Continuous Shuffle Dilated Convolutions , 2020, IEEE Access.

[43]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Seung-Ho Lee,et al.  Novel Method of Semantic Segmentation Applicable to Augmented Reality , 2020, Sensors.