On the Robustness of Redundant Teacher-Student Frameworks for Semantic Segmentation

The trend towards autonomous systems in today's technology comes with the need for environment perception. Deep neural networks (DNNs) constantly showed state-of-the-art performance over the last few years in visual machine perception, e.g., semantic segmentation. While DNNs work fine on uncorrupted data, recently introduced adversarial examples (AEs) led to misclassification with high confidence. This lack of robustness against such adversarial attacks questions the use of DNNs in safety-critical autonomous systems, e.g., autonomous driving vehicles. In this work, we address the mentioned problem with the use of a redundant teacher-student framework, consisting of a static teacher network (T), a static student network (S), and a constantly adapting student network (A). By using this triplet in combination with a novel inverse feature matching (IFM) loss, we show that a significant robustness increase of student DNNs against adversarial attacks is achieveable, while maintaining semantic segmentation quality at a reasonably high level. With our approach, we manage to increase the mean intersection over union (mean IoU) ratio between static student adversarial examples and clean images from about 35 % to about 80 % on the Cityscapes dataset. Moreover, our proposed method can be integrated into any DNN-based perception mechanism to increase the (online) robustness in an adversarial environment, created from static model knowledge.

[1]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[4]  Alan L. Yuille,et al.  Adversarial Examples for Semantic Segmentation and Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Xiaojuan Qi,et al.  ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[6]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[9]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[10]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[11]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Piotr Bilinski,et al.  Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[15]  Yi Zhang,et al.  PSANet: Point-wise Spatial Attention Network for Scene Parsing , 2018, ECCV.

[16]  Thomas Brox,et al.  Universal Adversarial Perturbations Against Semantic Image Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[18]  Linda G. Shapiro,et al.  ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation , 2018, ECCV.

[19]  Philip H. S. Torr,et al.  On the Robustness of Semantic Segmentation Models to Adversarial Attacks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Rama Chellappa,et al.  Gaussian Conditional Random Field Network for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[23]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[24]  Thomas Brox,et al.  Adversarial Examples for Semantic Image Segmentation , 2017, ICLR.

[25]  Tim Fingscheidt,et al.  Towards Corner Case Detection for Autonomous Driving , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[26]  Shuai Yi,et al.  FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction , 2019, NeurIPS.

[27]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[28]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[30]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[33]  Stella X. Yu,et al.  Adaptive Affinity Fields for Semantic Segmentation , 2018, ECCV.

[34]  Gang Wang,et al.  Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Lorenzo Porzi,et al.  In-place Activated BatchNorm for Memory-Optimized Training of DNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Kun Yu,et al.  DenseASPP for Semantic Segmentation in Street Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Gang Yu,et al.  BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[40]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Wen Gao,et al.  Dense Relation Network: Learning Consistent and Context-Aware Representation for Semantic Image Segmentation , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[42]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[43]  Eduardo Romera,et al.  ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[44]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[45]  Yifan Gong,et al.  Learning small-size DNN with output-distribution-based criteria , 2014, INTERSPEECH.

[46]  Tim Fingscheidt,et al.  GAN- vs. JPEG2000 Image Compression for Distributed Automotive Perception: Higher Peak SNR Does Not Mean Better Semantic Segmentation , 2019, ArXiv.

[47]  George Papandreou,et al.  Searching for Efficient Multi-Scale Architectures for Dense Image Prediction , 2018, NeurIPS.

[48]  Rich Caruana,et al.  Model compression , 2006, KDD '06.