论文信息 - BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations

BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations

Annotating images with pixel-wise labels is a timeconsuming and costly process. Recently, DatasetGAN [82] showcased a promising alternative – to synthesize a large labeled dataset via a generative adversarial network (GAN) by exploiting a small set of manually labeled, GANgenerated images. Here, we scale DatasetGAN to ImageNet scale of class diversity. We take image samples from the class-conditional generative model BigGAN [6] trained on ImageNet, and manually annotate only 5 images per class, for all 1k classes. By training an effective feature segmentation architecture on top of BigGAN, we turn BigGAN into a labeled dataset generator. We further show that VQGAN [19] can similarly serve as a dataset generator, leveraging the already annotated data. We create a new ImageNet benchmark by labeling an additional set of real images and evaluate segmentation performance in a variety of settings. Through an extensive ablation study, we show big gains in leveraging a large generated dataset to train different supervised and self-supervised backbone models on pixel-wise tasks. Furthermore, we demonstrate that using our synthesized datasets for pre-training leads to improvements over standard ImageNet pre-training on several downstream datasets, such as PASCAL-VOC, MS-COCO, Cityscapes and chest X-ray, as well as tasks (detection, segmentation). Our benchmark will be made public and maintain a leaderboard for this challenging task. Project Page: https://nv-tlabs.github.io/big-datasetgan/

[1] Behnam Neyshabur,et al. The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers , 2021, ICLR.

[2] Sanja Fidler,et al. Fast Interactive Object Annotation With Curve-GCN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Hao Su,et al. A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Alexei A. Efros,et al. What makes ImageNet good for transfer learning? , 2016, ArXiv.

[5] Sanja Fidler,et al. Beat the MTurkers: Automatic Image Labeling from Weak 3D Supervision , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Luc Van Gool,et al. Deep Extreme Cut: From Extreme Points to Object Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Aaron C. Courville,et al. Unsupervised Learning of Dense Visual Representations , 2020, NeurIPS.

[10] David H. Douglas,et al. ALGORITHMS FOR THE REDUCTION OF THE NUMBER OF POINTS REQUIRED TO REPRESENT A DIGITIZED LINE OR ITS CARICATURE , 1973 .

[11] Julien Mairal,et al. Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12] Supasorn Suwajanakorn,et al. Repurposing GANs for One-shot Semantic Part Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Sanja Fidler,et al. Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation , 2020, ECCV.

[15] Raymond J. Mooney,et al. Active Learning for Probability Estimation Using Jensen-Shannon Divergence , 2005, ECML.

[16] Konstantin Sofiiuk,et al. Learning High-Resolution Domain-Specific Representations with a GAN Generator , 2020, S+SSPR.

[17] Qiao Wang,et al. VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Antonio M. López,et al. The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Chun-Fu Chen,et al. A Broad Study on the Transferability of Visual Representations with Contrastive Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[20] Sanja Fidler,et al. Object Instance Annotation With Deep Extreme Level Set Evolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.

[23] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Sanja Fidler,et al. VirtualHome: Simulating Household Activities Via Programs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26] Germán Ros,et al. CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[27] Sanja Fidler,et al. Meta-Sim: Learning to Generate Synthetic Datasets , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28] Ali Razavi,et al. Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[29] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[30] Jian Sun,et al. ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Antonio Torralba,et al. Nonparametric scene parsing: Label transfer via dense scene alignment , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Jaakko Lehtinen,et al. Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[35] Michal Valko,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[36] Ambrish Tyagi,et al. Box2Seg: Attention Weighted Loss and Discriminative Feature Learning for Weakly Supervised Segmentation , 2020, ECCV.

[37] Changxi Zheng,et al. Linear Semantics in Generative Adversarial Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Tao Kong,et al. Dense Contrastive Learning for Self-Supervised Visual Pre-Training , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Vladlen Koltun,et al. Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[41] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[42] Jan Kautz,et al. Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[43] Kurt Keutzer,et al. Region Similarity Representation Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[44] Kaiming He,et al. Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[45] Stefan Jaeger,et al. Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. , 2014, Quantitative imaging in medicine and surgery.

[46] Yuke Zhu,et al. DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[47] Saining Xie,et al. An Empirical Study of Training Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[48] Alejandro F. Frangi,et al. Federated Simulation for Medical Imaging , 2020, MICCAI.

[49] Sanja Fidler,et al. DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51] Wei Zeng,et al. Chest X-Ray Analysis of Tuberculosis by Deep Learning with Segmentation and Augmentation , 2018, 2018 IEEE 38th International Conference on Electronics and Nanotechnology (ELNANO).

[52] K. Doi,et al. Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists' detection of pulmonary nodules. , 2000, AJR. American journal of roentgenology.

[53] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[54] George Papandreou,et al. Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[55] Sanja Fidler,et al. Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Arthur Gretton,et al. Demystifying MMD GANs , 2018, ICLR.

[57] Jaakko Lehtinen,et al. Alias-Free Generative Adversarial Networks , 2021, NeurIPS.

[58] Andreas Nürnberger,et al. The Power of Ensembles for Active Learning in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59] Svetlana Lazebnik,et al. Superparsing , 2010, International Journal of Computer Vision.

[60] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[61] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[62] Jitendra Malik,et al. Cost-Sensitive Active Learning for Intracranial Hemorrhage Detection , 2018, MICCAI.

[63] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[64] Amir Rosenfeld,et al. Extracting foreground masks towards object recognition , 2011, 2011 International Conference on Computer Vision.

[65] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[66] Kate Saenko,et al. VisDA: The Visual Domain Adaptation Challenge , 2017, ArXiv.

[67] Patrick Esser,et al. Taming Transformers for High-Resolution Image Synthesis , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[69] Matthieu Guillaumin,et al. ImageNet Auto-Annotation with Segmentation Propagation , 2014, International Journal of Computer Vision.

[70] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[71] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[72] Andrew Blake,et al. "GrabCut" , 2004, ACM Trans. Graph..

[73] Yuri Boykov,et al. Normalized Cut Loss for Weakly-Supervised CNN Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[74] Sanja Fidler,et al. Devil Is in the Edges: Learning Semantic Boundaries From Noisy Annotations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).