DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

We introduce DatasetGAN: an automatic procedure to generate massive datasets of high-quality semantically segmented images requiring minimal human effort. Current deep networks are extremely data-hungry, benefiting from training on large-scale datasets, which are time consuming to annotate. Our method relies on the power of recent GANs to generate realistic images. We show how the GAN latent code can be decoded to produce a semantic segmentation of the image. Training the decoder only needs a few labeled examples to generalize to the rest of the latent space, resulting in an infinite annotated dataset generator! These generated datasets can then be used for training any computer vision architecture just as real datasets are. As only a few images need to be manually segmented, it becomes possible to annotate images in extreme detail and generate datasets with rich object and part segmentations. To showcase the power of our approach, we generated datasets for 7 image segmentation tasks which include pixel-level labels for 34 human face parts, and 32 car parts. Our approach outperforms all semi-supervised baselines significantly and is on par with fully supervised methods, which in some cases require as much as 100x more annotated data as our method.

[1]  Ming-Hsuan Yang,et al.  Adversarial Learning for Semi-supervised Semantic Segmentation , 2018, BMVC.

[2]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[3]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[4]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[5]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[6]  David J. Kriegman,et al.  Image to Image Translation for Domain Adaptation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jaakko Lehtinen,et al.  Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer , 2019, NeurIPS.

[9]  Ling Shao,et al.  Zero-Shot Learning Using Synthesised Unseen Visual Data with Diffusion Regularisation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Ramazan Gokberk Cinbis,et al.  Gradient Matching Generative Networks for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[12]  Thomas Brox,et al.  Semi-Supervised Semantic Segmentation With High- and Low-Level Consistency , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[15]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[16]  David Berthelot,et al.  FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[17]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Konstantin Sofiiuk,et al.  Learning High-Resolution Domain-Specific Representations with a GAN Generator , 2020, S+SSPR.

[19]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[20]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Weiwei Zhang,et al.  Cat Head Detection - How to Effectively Exploit Shape and Texture Features , 2008, ECCV.

[23]  Silvio Savarese,et al.  A Geometric Approach to Active Learning for Convolutional Neural Networks , 2017, ArXiv.

[24]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[25]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[26]  Sanja Fidler,et al.  Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation , 2020, ECCV.

[27]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Concetto Spampinato,et al.  Semi Supervised Semantic Segmentation Using Generative Adversarial Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jitendra Malik,et al.  Cost-Sensitive Active Learning for Intracranial Hemorrhage Detection , 2018, MICCAI.

[31]  Raymond J. Mooney,et al.  Active Learning for Probability Estimation Using Jensen-Shannon Divergence , 2005, ECML.

[32]  Yang Zou,et al.  Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training , 2018, ArXiv.

[33]  Sanja Fidler,et al.  Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Xu Ji,et al.  Invariant Information Clustering for Unsupervised Image Classification and Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Pietro Perona,et al.  Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Frédéric Jurie,et al.  Generating Visual Representations for Zero-Shot Classification , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[37]  Changick Kim,et al.  Self-Ensembling With GAN-Based Data Augmentation for Domain Adaptation in Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[39]  Patrick Pérez,et al.  ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[41]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[42]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[43]  Bolei Zhou,et al.  Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[44]  Alejandro F. Frangi,et al.  Federated Simulation for Medical Imaging , 2020, MICCAI.

[45]  Gustavo Carneiro,et al.  Multi-modal Cycle-consistent Generalized Zero-Shot Learning , 2018, ECCV.

[46]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[47]  Camille Couprie,et al.  Semantic Segmentation using Adversarial Networks , 2016, NIPS 2016.

[48]  Andreas Nürnberger,et al.  The Power of Ensembles for Active Learning in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Di Qiu,et al.  Guided Collaborative Training for Pixel-wise Semi-Supervised Learning , 2020, ECCV.

[50]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[51]  Sanja Fidler,et al.  Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering , 2021, ICLR.

[52]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Sanja Fidler,et al.  Meta-Sim: Learning to Generate Synthetic Datasets , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Geoffrey E. Hinton,et al.  To recognize shapes, first learn to generate images. , 2007, Progress in brain research.