Improving Performance of Semantic Segmentation CycleGANs by Noise Injection into the Latent Segmentation Space

In recent years, semantic segmentation has taken benefit from various works in computer vision. Inspired by the very versatile CycleGAN architecture, we combine semantic segmentation with the concept of cycle consistency to enable a multitask training protocol. However, learning is largely prevented by the so-called steganography effect, which expresses itself as watermarks in the latent segmentation domain, making image reconstruction a too easy task. To combat this, we propose a noise injection, based either on quantization noise or on Gaussian noise addition to avoid this disadvantageous information flow in the cycle architecture. We find that noise injection significantly reduces the generation of watermarks and thus allows the recognition of highly relevant classes such as “traffic signs”, which are hardly detected by the ERFNet baseline. We report mIoU and PSNR results on the Cityscapes dataset for semantic segmentation and image reconstruction, respectively. The proposed methodology allows to achieve an mIoU improvement on the Cityscapes validation set of 5.7 % absolute over the same CycleGAN without noise injection, and still an absolute 4.9 % over the ERFNet non-cyclic baseline.

[1]  Piotr Bilinski,et al.  Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Steve Branson,et al.  Learned Video Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  David Salomon,et al.  Data Compression: The Complete Reference , 2006 .

[4]  Christian Desrosiers,et al.  Revisiting CycleGAN for semi-supervised segmentation , 2019, ArXiv.

[5]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Luca Benini,et al.  Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations , 2017, NIPS.

[7]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Fabio Pizzati,et al.  Domain Bridge for Unpaired Image-to-Image Translation and Unsupervised Domain Adaptation , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9]  Nir Shavit,et al.  Generative Compression , 2017, 2018 Picture Coding Symposium (PCS).

[10]  David Minnen,et al.  Variable Rate Image Compression with Recurrent Neural Networks , 2015, ICLR.

[11]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[12]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Ganesh K. Venayagamoorthy,et al.  Neural networks based non-uniform scalar quantizer design with particle swarm optimization , 2005, Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005..

[14]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[15]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Chih-Yuan Yang,et al.  Learning a No-Reference Quality Metric for Single-Image Super-Resolution , 2016, Comput. Vis. Image Underst..

[17]  Houqiang Li,et al.  Quantization Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Lubomir D. Bourdev,et al.  Real-Time Adaptive Image Compression , 2017, ICML.

[19]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[20]  Peter Schlicht,et al.  Robust Semantic Segmentation by Redundant Networks With a Layer-Specific Loss Contribution and Majority Vote , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[22]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[23]  Lucas Theis,et al.  Lossy Image Compression with Compressive Autoencoders , 2017, ICLR.

[24]  Jian Yang,et al.  MemNet: A Persistent Memory Network for Image Restoration , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[26]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[27]  Tim Fingscheidt,et al.  Focussing Learned Image Compression to Semantic Classes for V2X Applications , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[28]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[29]  Michael McComas,et al.  Steganography GAN: Cracking Steganography with Cycle Generative Adversarial Networks , 2020, ArXiv.

[30]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Lorenzo Porzi,et al.  In-place Activated BatchNorm for Memory-Optimized Training of DNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Peter Schlicht,et al.  Unsupervised Domain Adaptation to Improve Image Segmentation Quality Both in the Source and Target Domain , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Daniel Rueckert,et al.  A Deep Cascade of Convolutional Neural Networks for Dynamic MR Image Reconstruction , 2017, IEEE Transactions on Medical Imaging.

[34]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[35]  Minqing Zhang,et al.  Recent Advances of Image Steganography With Generative Adversarial Networks , 2019, IEEE Access.

[36]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[37]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  Peter Schlicht,et al.  Scalar and Vector Quantization for Learned Image Compression: A Study on the Effects of MSE and GAN Loss in Various Spaces , 2020, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).

[41]  David Zhang,et al.  A comprehensive evaluation of full reference image quality assessment algorithms , 2012, 2012 19th IEEE International Conference on Image Processing.

[42]  Antonio J. Plaza,et al.  Image Segmentation Using Deep Learning: A Survey , 2021, IEEE transactions on pattern analysis and machine intelligence.

[43]  Eduardo Romera,et al.  ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[44]  Luc Van Gool,et al.  Generative Adversarial Networks for Extreme Learned Image Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Robert M. Gray,et al.  Quantization noise spectra , 1990, IEEE Trans. Inf. Theory.

[46]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[47]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[48]  Wangmeng Zuo,et al.  Learning Deep CNN Denoiser Prior for Image Restoration , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Dengpan Ye,et al.  Heard More Than Heard: An Audio Steganography Method Based on GAN , 2019, ArXiv.

[50]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Peter Schlicht,et al.  On the Robustness of Redundant Teacher-Student Frameworks for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[52]  Tim Fingscheidt,et al.  Towards Corner Case Detection for Autonomous Driving , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[53]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[55]  Mark Sandler,et al.  CycleGAN, a Master of Steganography , 2017, ArXiv.

[56]  Tim Fingscheidt,et al.  On Low-Bitrate Image Compression for Distributed Automotive Perception: Higher Peak SNR Does Not Mean Better Semantic Segmentation , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[57]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[58]  Jan Kautz,et al.  Loss Functions for Image Restoration With Neural Networks , 2017, IEEE Transactions on Computational Imaging.

[59]  Peter Schlicht,et al.  Self-Supervised Domain Mismatch Estimation for Autonomous Perception , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[60]  Peyman Milanfar,et al.  NIMA: Neural Image Assessment , 2017, IEEE Transactions on Image Processing.