论文信息 - Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization

Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization

Training deep networks with limited labeled data while achieving a strong generalization ability is key in the quest to reduce human annotation efforts. This is the goal of semi-supervised learning, which exploits more widely available unlabeled data to complement small labeled data sets. In this paper, we propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels. Concretely, we learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of un-labeled images supplemented with only few labeled ones. We build our architecture on top of StyleGAN2 [45], augmented with a label synthesis branch. Image labeling at test time is achieved by first embedding the target image into the joint latent space via an encoder network and test-time optimization, and then generating the label from the inferred embedding. We evaluate our approach in two important domains: medical image segmentation and part-based face segmentation. We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization, such as transferring from CT to MRI in medical imaging, and photographs of real faces to paintings, sculptures, and even cartoons and animal faces. Project Page: https://nv-tlabs.github.io/semanticGAN/

[1] Tero Karras,et al. Training Generative Adversarial Networks with Limited Data , 2020, NeurIPS.

[2] Jose Dolz,et al. Few-shot 3D Multi-modal Medical Image Segmentation using Generative Adversarial Learning , 2018, ArXiv.

[3] Naciye Sinem Gezer,et al. Comparison of semi-automatic and deep learning-based automatic methods for liver segmentation in living liver transplant donors. , 2020, Diagnostic and interventional radiology.

[4] Subarna Tripathi,et al. Precise Recovery of Latent Vectors from Generative Adversarial Networks , 2017, ICLR.

[5] Lingyun Wu,et al. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Shin Ishii,et al. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] C'eline Hudelot,et al. Controlling generative models with continuous factors of variations , 2020, ICLR.

[8] Sungroh Yoon,et al. FickleNet: Weakly and Semi-Supervised Semantic Image Segmentation Using Stochastic Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Noel C. F. Codella,et al. Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC) , 2019, ArXiv.

[10] Michal Valko,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[11] Sanja Fidler,et al. Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation , 2020, ECCV.

[12] Daniel Cohen-Or,et al. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Geoffrey E. Hinton,et al. To recognize shapes, first learn to generate images. , 2007, Progress in brain research.

[14] Minh N. Do,et al. Semantic Image Inpainting with Deep Generative Models , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Jaakko Lehtinen,et al. Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Quoc V. Le,et al. Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[17] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Sanja Fidler,et al. Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering , 2021, ICLR.

[19] Sanja Fidler,et al. Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D , 2020, ECCV.

[20] Yoshua Bengio,et al. Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[21] Eduardo Valle,et al. Handling Inter-Annotator Agreement for Automated Skin Lesion Segmentation , 2019, ArXiv.

[22] Andreas Nürnberger,et al. CHAOS Challenge - Combined (CT-MR) Healthy Abdominal Organ Segmentation , 2020, Medical Image Anal..

[23] Anima Anandkumar,et al. Neural Networks with Recurrent Generative Feedback , 2020, NeurIPS.

[24] Andrew Brock,et al. Neural Photo Editing with Introspective Adversarial Networks , 2016, ICLR.

[25] Bolei Zhou,et al. Image Processing Using Multi-Code GAN Prior , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Augustus Odena,et al. Semi-Supervised Learning with Generative Adversarial Networks , 2016, ArXiv.

[27] Alexander G. Schwing,et al. Co-Generation with GANs using AIS based HMC , 2019, NeurIPS.

[28] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] Aaron C. Courville,et al. Adversarially Learned Inference , 2016, ICLR.

[30] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31] Geoffrey E. Hinton,et al. Lookahead Optimizer: k steps forward, 1 step back , 2019, NeurIPS.

[32] Yuxing Tang,et al. XLSor: A Robust and Accurate Lung Segmentor on Chest X-Rays Using Criss-Cross Attention and Customized Radiorealistic Abnormalities Generation , 2018, MIDL.

[33] George Shih,et al. A patient-centric dataset of images and metadata for identifying melanomas using clinical context , 2020, Scientific Data.

[34] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[35] Anil A. Bharath,et al. Inverting the Generator of a Generative Adversarial Network , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[36] Stefan Jaeger,et al. Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. , 2014, Quantitative imaging in medicine and surgery.

[37] Bram van Ginneken,et al. Segmentation of anatomical structures in chest radiographs using supervised methods: a comparative study on a public database , 2006, Medical Image Anal..

[38] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.

[39] Ming-Hsuan Yang,et al. Adversarial Learning for Semi-supervised Semantic Segmentation , 2018, BMVC.

[40] Abhinav Gupta,et al. Scaling and Benchmarking Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41] Alejandro F. Frangi,et al. Federated Simulation for Medical Imaging , 2020, MICCAI.

[42] Xu Ji,et al. Invariant Information Clustering for Unsupervised Image Classification and Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43] Bolei Zhou,et al. Semantic photo manipulation with a generative image prior , 2019, ACM Trans. Graph..

[44] Nikos Komodakis,et al. Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[45] Ali Razavi,et al. Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[46] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[47] Wei Zeng,et al. Chest X-Ray Analysis of Tuberculosis by Deep Learning with Segmentation and Augmentation , 2018, 2018 IEEE 38th International Conference on Electronics and Nanotechnology (ELNANO).

[48] Michael I. Jordan,et al. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[49] K. Doi,et al. Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists' detection of pulmonary nodules. , 2000, AJR. American journal of roentgenology.

[50] Minyoung Huh,et al. Transforming and Projecting Images into Class-conditional Generative Networks , 2020, ECCV.

[51] Bogdan Raducanu,et al. Invertible Conditional GANs for image editing , 2016, ArXiv.

[52] Quoc V. Le,et al. Meta Pseudo Labels , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Dong-Hyun Lee,et al. Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[54] Jong Chul Ye,et al. Deep Learning COVID-19 Features on CXR Using Limited Training Data Sets , 2020, IEEE Transactions on Medical Imaging.

[55] Thomas Brox,et al. Semi-Supervised Semantic Segmentation With High- and Low-Level Consistency , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56] Yong Zhou,et al. A survey of semi- and weakly supervised semantic segmentation of images , 2019, Artificial Intelligence Review.

[57] Timo Aila,et al. Semi-supervised semantic segmentation needs strong, varied perturbations , 2019, BMVC.

[58] Tim Salimans,et al. Milking CowMask for Semi-Supervised Image Classification , 2020, VISIGRAPP.

[59] Tolga Tasdizen,et al. Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning , 2016, NIPS.

[60] Laurens van der Maaten,et al. Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61] Paolo Favaro,et al. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[62] David Berthelot,et al. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[63] Sanja Fidler,et al. DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[65] Camille Couprie,et al. Semantic Segmentation using Adversarial Networks , 2016, NIPS 2016.

[66] Di Qiu,et al. Guided Collaborative Training for Pixel-wise Semi-Supervised Learning , 2020, ECCV.

[67] Geoffrey E. Hinton,et al. On deep generative models with applications to recognition , 2011, CVPR 2011.

[68] Sebastian Nowozin,et al. Which Training Methods for GANs do actually Converge? , 2018, ICML.

[69] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[70] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71] Taesung Park,et al. Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72] Jeffrey Glaister,et al. Automatic segmentation of skin lesions from dermatological photographs , 2013 .

[73] Yuqi Li,et al. GAN-Based Projector for Faster Recovery With Convergence Guarantees in Linear Inverse Problems , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[74] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[75] George Papandreou,et al. Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[76] Ronald M. Summers,et al. ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[77] Trevor Darrell,et al. Adversarial Feature Learning , 2016, ICLR.

[78] Bolei Zhou,et al. Seeing What a GAN Cannot Generate , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[79] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[80] Wenyu Liu,et al. Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[81] David Berthelot,et al. ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring , 2020, ICLR.

[82] Alexei A. Efros,et al. Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[83] Hao Liu,et al. Hybrid Discriminative-Generative Training via Contrastive Learning , 2020, ArXiv.

[84] David Berthelot,et al. MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[85] Geoffrey E. Hinton,et al. Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[86] Lennart Svensson,et al. ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[87] Klaus H. Maier-Hein,et al. nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation , 2018, Bildverarbeitung für die Medizin.

[88] Jan Kautz,et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[89] Deli Zhao,et al. In-Domain GAN Inversion for Real Image Editing , 2020, ECCV.

[90] Yuri Viazovetskyi,et al. StyleGAN2 Distillation for Feed-forward Image Manipulation , 2020, ECCV.

[91] Raja Bala,et al. Editing in Style: Uncovering the Local Semantics of GANs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[92] Sanja Fidler,et al. Meta-Sim: Learning to Generate Synthetic Datasets , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[93] Mohammad Norouzi,et al. Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[94] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[95] Peter Wonka,et al. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[96] Hao Chen,et al. The Liver Tumor Segmentation Benchmark (LiTS) , 2019, Medical Image Anal..

[97] Yinghuan Shi,et al. WebCaricature: a benchmark for caricature recognition , 2017, BMVC.

[98] Sergio Casas,et al. End-To-End Interpretable Neural Motion Planner , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[99] Pedro M. Ferreira,et al. PH2 - A dermoscopic image database for research and benchmarking , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[100] Alexei A. Efros,et al. Swapping Autoencoder for Deep Image Manipulation , 2020, NeurIPS.

[101] Concetto Spampinato,et al. Semi Supervised Semantic Segmentation Using Generative Adversarial Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[102] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).