论文信息 - GAN-Supervised Dense Visual Alignment

GAN-Supervised Dense Visual Alignment

We propose GAN-Supervised Learning, a framework for learning discriminative models and their GAN-generated training data jointly end-to-end. We apply our framework to the dense visual alignment problem. Inspired by the classic Congealing method, our GAN gealing algorithm trains a Spatial Transformer to map random samples from a GAN trained on unaligned data to a common, jointly-learned target mode. We show results on eight datasets, all of which demonstrate our method successfully aligns complex data and discovers dense correspondences. GANgealing significantly outperforms past self-supervised correspondence algorithms and performs on-par with (and sometimes exceeds) state-of-the-art supervised correspondence algorithms on several datasets-without making use of any correspondence supervision or data augmentation and despite being trained exclusively on GAN-generated data. For precise correspondence, we improve upon state-of-the-art supervised methods by as much as 3 ×. We show applications of our method for augmented reality, image editing and automated pre-processing of image datasets for downstream GAN training.

[1] Yonglong Tian,et al. Generative Models as a Data Source for Multiview Representation Learning , 2021, International Conference on Learning Representations.

[2] Andrea Vedaldi,et al. Finding an Unsupervised Image Segmenter in Each of Your Deep Generative Models , 2021, ICLR.

[3] Tian Han,et al. Deformable Generator Networks: Unsupervised Disentanglement of Appearance and Geometry , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] A. Torralba,et al. Learning to See by Looking at Noise , 2021, NeurIPS.

[5] Seungryong Kim,et al. CATs: Cost Aggregation Transformers for Visual Correspondence , 2021, NeurIPS.

[6] Dimitris N. Metaxas,et al. A Good Image Generator Is What You Need for High-Resolution Video Synthesis , 2021, ICLR.

[7] Eli Shechtman,et al. Ensembling with Deep Generative Views , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Ming-Yu Liu,et al. GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9] Sanja Fidler,et al. DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Daniel Cohen-Or,et al. ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11] Minsu Cho,et al. Convolutional Hough Matching Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Supasorn Suwajanakorn,et al. Repurposing GANs for One-Shot Semantic Part Segmentation , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Hao Wang,et al. Generative Interventions for Causal Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Anil K. Jain,et al. Lifting 2D StyleGAN for 3D-Aware Face Generation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Chen Change Loy,et al. Do 2D GANs Know 3D Shape? Unsupervised 3D shape reconstruction from 2D Image GANs , 2020, ICLR.

[16] A. Torralba,et al. Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering , 2020, ICLR.

[17] Seungryong Kim,et al. Semantic Correspondence with Transformers , 2021, ArXiv.

[18] Graham W. Taylor,et al. Instance Selection for GANs , 2020, NeurIPS.

[19] Jean Ponce,et al. Learning to Compose Hypercolumns for Visual Correspondence , 2020, ECCV.

[20] Abhinav Gupta,et al. Implicit Mesh Reconstruction from Unannotated Image Collections , 2020, ArXiv.

[21] Mathieu Aubry,et al. Deep Transformation-Invariant Clustering , 2020, NeurIPS.

[22] Artem Babenko,et al. Big GANs Are Watching You: Towards Unsupervised Object Segmentation with Off-the-Shelf Generative Models , 2020, ArXiv.

[23] Makoto Yamada,et al. Semantic Correspondence as an Optimal Transport Problem , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Tero Karras,et al. Training Generative Adversarial Networks with Limited Data , 2020, NeurIPS.

[25] Minyoung Huh,et al. Transforming and Projecting Images into Class-conditional Generative Networks , 2020, ECCV.

[26] Aaron Hertzmann,et al. GANSpace: Discovering Interpretable GAN Controls , 2020, NeurIPS.

[27] Abhinav Gupta,et al. Articulation-Aware Canonical Surface Mapping , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Jia Deng,et al. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[29] Jinwoo Shin,et al. Freeze the Discriminator: a Simple Baseline for Fine-Tuning GANs , 2020, 2002.10964.

[30] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[31] Jung-Woo Ha,et al. StarGAN v2: Diverse Image Synthesis for Multiple Domains , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Tero Karras,et al. Analyzing and Improving the Image Quality of StyleGAN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Matthieu Cord,et al. This Dataset Does Not Exist: Training Models from Generated Images , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34] Bolei Zhou,et al. Seeing What a GAN Cannot Generate , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35] Jean Ponce,et al. SPair-71k: A Large-scale Benchmark for Semantic Correspondence , 2019, ArXiv.

[36] Jean Ponce,et al. Hyperpixel Flow: Semantic Correspondence With Multi-Layer Neural Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37] Shubham Tulsiani,et al. Canonical Surface Mapping via Geometric Cycle Consistency , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38] Bolei Zhou,et al. Semantic photo manipulation with a generative image prior , 2019, ACM Trans. Graph..

[39] Jeff Donahue,et al. Large Scale Adversarial Representation Learning , 2019, NeurIPS.

[40] Claus Aranha,et al. Data Augmentation Using GANs , 2019, ArXiv.

[41] Peter Wonka,et al. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Tomás Pajdla,et al. Neighbourhood Consensus Networks , 2018, NeurIPS.

[44] Bohyung Han,et al. Attentive Semantic Alignment with Offset-Aware Correlation Kernels , 2018, ECCV.

[45] David Cox,et al. Conditional Infilling GANs for Data Augmentation in Mammogram Classification , 2018, RAMBO+BIA+TIA@MICCAI.

[46] Ersin Yumer,et al. ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48] Josef Sivic,et al. End-to-End Weakly-Supervised Semantic Alignment , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[50] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[51] Andrea Vedaldi,et al. Unsupervised object learning from dense equivariant image labelling , 2017, NIPS 2017.

[52] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54] Josef Sivic,et al. Convolutional Neural Network Architecture for Geometric Matching , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55] Ersin Yumer,et al. Transformation-Grounded Image Generation Network for Novel 3D View Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Simon Lucey,et al. Inverse Compositional Spatial Transformer Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Thomas Brox,et al. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Andrew Brock,et al. Neural Photo Editing with Introspective Adversarial Networks , 2016, ICLR.

[59] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[60] Aaron C. Courville,et al. Adversarially Learned Inference , 2016, ICLR.

[61] Trevor Darrell,et al. Adversarial Feature Learning , 2016, ICLR.

[62] Honglak Lee,et al. Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[63] Alexei A. Efros,et al. Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[64] Victor S. Lempitsky,et al. DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation , 2016, ECCV.

[65] Luc Van Gool,et al. Dynamic Filter Networks , 2016, NIPS.

[66] Jitendra Malik,et al. View Synthesis by Appearance Flow , 2016, ECCV.

[67] David W. Jacobs,et al. WarpNet: Weakly Supervised Matching for Single-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[69] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[71] Yinda Zhang,et al. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[72] Pietro Perona,et al. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73] Yong Jae Lee,et al. FlowWeb: Joint image set alignment by weaving consistent, pixel-wise correspondences , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[75] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[76] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[77] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[78] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[79] Yong Jae Lee,et al. AverageExplorer: interactive exploration and alignment of visual data collections , 2014, ACM Trans. Graph..

[80] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[81] Hossein Mobahi,et al. A Compositional Model for Low-Dimensional Image Set Representation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[82] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[83] Allen R. Hanson,et al. Unsupervised Joint Alignment and Clustering using Bayesian Nonparametrics , 2012, UAI.

[84] Ira Kemelmacher-Shlizerman,et al. Collection flow , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[85] Alexei A. Efros,et al. Object Instance Sharing by Enhanced Bounding Box Correspondence , 2012, BMVC.

[86] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[87] John Wright,et al. RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[88] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[89] Erik G. Learned-Miller,et al. Unsupervised Joint Alignment of Complex Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[90] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[91] Erik G. Learned-Miller,et al. Data driven image models through continuous joint alignment , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[92] Simon Baker,et al. Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[93] B. Frey,et al. Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[94] Brendan J. Frey,et al. Estimating mixture models of images and inferring spatial transformations using the EM algorithm , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[95] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.