Repurposing GANs for One-shot Semantic Part Segmentation

While GANs have shown success in realistic image generation, the idea of using GANs for other tasks unrelated to synthesis is underexplored. Do GANs learn meaningful structural parts of objects during their attempt to reproduce those objects? In this work, we test this hypothesis and propose a simple and effective approach based on GANs for semantic part segmentation that requires as few as one label example along with an unlabeled dataset. Our key idea is to leverage a trained GAN to extract a pixel-wise representation from the input image and use it as feature vectors for a segmentation network. Our experiments demonstrate that this GAN-derived representation is "readily discriminative" and produces surprisingly good results that are comparable to those from supervised baselines trained with significantly more labels. We believe this novel repurposing of GANs underlies a new class of unsupervised representation learning, which can generalize to many other tasks. More results are available at https://RepurposeGANs.github.io/.

[1]  Raja Bala,et al.  Editing in Style: Uncovering the Local Semantics of GANs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yi Yang,et al.  SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation , 2018, IEEE Transactions on Cybernetics.

[3]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[4]  Shiguang Shan,et al.  AttGAN: Facial Attribute Editing by Only Changing What You Want , 2017, IEEE Transactions on Image Processing.

[5]  Jiashi Feng,et al.  PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Gregory Shakhnarovich,et al.  Colorization as a Proxy Task for Visual Understanding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  David J. Crandall,et al.  Meta-Reinforced Synthetic Data for One-Shot Fine-Grained Visual Recognition , 2019, NeurIPS.

[8]  Liang Zheng,et al.  Correlating Edge, Pose With Parsing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jeff Donahue,et al.  Large Scale Adversarial Representation Learning , 2019, NeurIPS.

[10]  Takeru Miyato,et al.  Spatially Controllable Image Synthesis with Internal Representation Collaging , 2018, 1811.10153.

[11]  Nicu Sebe,et al.  Motion-supervised Co-Part Segmentation , 2020, ArXiv.

[12]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[13]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  S. Tsogkas,et al.  Deep Learning for Semantic Part Segmentation with High-Level Guidance , 2015 .

[15]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[16]  Chen Sun,et al.  Unsupervised Discovery of Parts, Structure, and Dynamics , 2019, ICLR.

[17]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Eric P. Xing,et al.  Few-Shot Semantic Segmentation with Prototype Learning , 2018, BMVC.

[20]  Martin Jägersand,et al.  One-Shot Weakly Supervised Video Object Segmentation , 2019, ArXiv.

[21]  Yonghong Tian,et al.  Ordinal Multi-Task Part Segmentation With Recurrent Prior Generation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.

[24]  Artem Babenko,et al.  Unsupervised Discovery of Interpretable Directions in the GAN Latent Space , 2020, ICML.

[25]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[26]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[29]  Ersin Yumer,et al.  Neural Face Editing with Intrinsic Image Disentangling , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yedid Hoshen,et al.  Style Generator Inversion for Image Enhancement and Animation , 2019, ArXiv.

[31]  Alan L. Yuille,et al.  Joint Object and Part Segmentation Using Deep Learned Potentials , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[33]  Qingyang Xiao,et al.  Pose-Guided Knowledge Transfer for Object Part Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[35]  Jan Kautz,et al.  NVAE: A Deep Hierarchical Variational Autoencoder , 2020, NeurIPS.

[36]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[37]  Alexey Shvets,et al.  TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation , 2018, Computer-Aided Analysis of Gastrointestinal Videos.

[38]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[39]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Sanja Fidler,et al.  Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Xuming He,et al.  Part-aware Prototype Network for Few-shot Semantic Segmentation , 2020, ECCV.

[42]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[43]  Byron Boots,et al.  One-Shot Learning for Semantic Segmentation , 2017, BMVC.

[44]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[45]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Peter Wonka,et al.  Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Cewu Lu,et al.  Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  Pavlo Molchanov,et al.  SCOPS: Self-Supervised Co-Part Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[52]  Lingyun Wu,et al.  MaskGAN: Towards Diverse and Interactive Facial Image Manipulation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[55]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[56]  Mehdi Rezagholiradeh,et al.  Reg-Gan: Semi-Supervised Learning Based on Generative Adversarial Networks for Regression , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).