GAN-Supervised Dense Visual Alignment

We propose GAN-Supervised Learning, a framework for learning discriminative models and their GAN-generated training data jointly end-to-end. We apply our framework to the dense visual alignment problem. Inspired by the classic Congealing method, our GAN gealing algorithm trains a Spatial Transformer to map random samples from a GAN trained on unaligned data to a common, jointly-learned target mode. We show results on eight datasets, all of which demonstrate our method successfully aligns complex data and discovers dense correspondences. GANgealing significantly outperforms past self-supervised correspondence algorithms and performs on-par with (and sometimes exceeds) state-of-the-art supervised correspondence algorithms on several datasets-without making use of any correspondence supervision or data augmentation and despite being trained exclusively on GAN-generated data. For precise correspondence, we improve upon state-of-the-art supervised methods by as much as 3 ×. We show applications of our method for augmented reality, image editing and automated pre-processing of image datasets for downstream GAN training.

[1]  Yonglong Tian,et al.  Generative Models as a Data Source for Multiview Representation Learning , 2021, International Conference on Learning Representations.

[2]  Andrea Vedaldi,et al.  Finding an Unsupervised Image Segmenter in Each of Your Deep Generative Models , 2021, ICLR.

[3]  Tian Han,et al.  Deformable Generator Networks: Unsupervised Disentanglement of Appearance and Geometry , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  A. Torralba,et al.  Learning to See by Looking at Noise , 2021, NeurIPS.

[5]  Seungryong Kim,et al.  CATs: Cost Aggregation Transformers for Visual Correspondence , 2021, NeurIPS.

[6]  Dimitris N. Metaxas,et al.  A Good Image Generator Is What You Need for High-Resolution Video Synthesis , 2021, ICLR.

[7]  Eli Shechtman,et al.  Ensembling with Deep Generative Views , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ming-Yu Liu,et al.  GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Sanja Fidler,et al.  DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Daniel Cohen-Or,et al.  ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Minsu Cho,et al.  Convolutional Hough Matching Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Supasorn Suwajanakorn,et al.  Repurposing GANs for One-Shot Semantic Part Segmentation , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Hao Wang,et al.  Generative Interventions for Causal Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Anil K. Jain,et al.  Lifting 2D StyleGAN for 3D-Aware Face Generation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Chen Change Loy,et al.  Do 2D GANs Know 3D Shape? Unsupervised 3D shape reconstruction from 2D Image GANs , 2020, ICLR.

[16]  A. Torralba,et al.  Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering , 2020, ICLR.

[17]  Seungryong Kim,et al.  Semantic Correspondence with Transformers , 2021, ArXiv.

[18]  Graham W. Taylor,et al.  Instance Selection for GANs , 2020, NeurIPS.

[19]  Jean Ponce,et al.  Learning to Compose Hypercolumns for Visual Correspondence , 2020, ECCV.

[20]  Abhinav Gupta,et al.  Implicit Mesh Reconstruction from Unannotated Image Collections , 2020, ArXiv.

[21]  Mathieu Aubry,et al.  Deep Transformation-Invariant Clustering , 2020, NeurIPS.

[22]  Artem Babenko,et al.  Big GANs Are Watching You: Towards Unsupervised Object Segmentation with Off-the-Shelf Generative Models , 2020, ArXiv.

[23]  Makoto Yamada,et al.  Semantic Correspondence as an Optimal Transport Problem , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Tero Karras,et al.  Training Generative Adversarial Networks with Limited Data , 2020, NeurIPS.

[25]  Minyoung Huh,et al.  Transforming and Projecting Images into Class-conditional Generative Networks , 2020, ECCV.

[26]  Aaron Hertzmann,et al.  GANSpace: Discovering Interpretable GAN Controls , 2020, NeurIPS.

[27]  Abhinav Gupta,et al.  Articulation-Aware Canonical Surface Mapping , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jia Deng,et al.  RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[29]  Jinwoo Shin,et al.  Freeze the Discriminator: a Simple Baseline for Fine-Tuning GANs , 2020, 2002.10964.

[30]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[31]  Jung-Woo Ha,et al.  StarGAN v2: Diverse Image Synthesis for Multiple Domains , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Tero Karras,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Matthieu Cord,et al.  This Dataset Does Not Exist: Training Models from Generated Images , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Bolei Zhou,et al.  Seeing What a GAN Cannot Generate , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Jean Ponce,et al.  SPair-71k: A Large-scale Benchmark for Semantic Correspondence , 2019, ArXiv.

[36]  Jean Ponce,et al.  Hyperpixel Flow: Semantic Correspondence With Multi-Layer Neural Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Shubham Tulsiani,et al.  Canonical Surface Mapping via Geometric Cycle Consistency , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Bolei Zhou,et al.  Semantic photo manipulation with a generative image prior , 2019, ACM Trans. Graph..

[39]  Jeff Donahue,et al.  Large Scale Adversarial Representation Learning , 2019, NeurIPS.

[40]  Claus Aranha,et al.  Data Augmentation Using GANs , 2019, ArXiv.

[41]  Peter Wonka,et al.  Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Tomás Pajdla,et al.  Neighbourhood Consensus Networks , 2018, NeurIPS.

[44]  Bohyung Han,et al.  Attentive Semantic Alignment with Offset-Aware Correlation Kernels , 2018, ECCV.

[45]  David Cox,et al.  Conditional Infilling GANs for Data Augmentation in Mammogram Classification , 2018, RAMBO+BIA+TIA@MICCAI.

[46]  Ersin Yumer,et al.  ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Josef Sivic,et al.  End-to-End Weakly-Supervised Semantic Alignment , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[50]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[51]  Andrea Vedaldi,et al.  Unsupervised object learning from dense equivariant image labelling , 2017, NIPS 2017.

[52]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Josef Sivic,et al.  Convolutional Neural Network Architecture for Geometric Matching , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Ersin Yumer,et al.  Transformation-Grounded Image Generation Network for Novel 3D View Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Simon Lucey,et al.  Inverse Compositional Spatial Transformer Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Andrew Brock,et al.  Neural Photo Editing with Introspective Adversarial Networks , 2016, ICLR.

[59]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[60]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[61]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[62]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[63]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[64]  Victor S. Lempitsky,et al.  DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation , 2016, ECCV.

[65]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[66]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[67]  David W. Jacobs,et al.  WarpNet: Weakly Supervised Matching for Single-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[69]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[71]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[72]  Pietro Perona,et al.  Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Yong Jae Lee,et al.  FlowWeb: Joint image set alignment by weaving consistent, pixel-wise correspondences , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[75]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[76]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[77]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[78]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[79]  Yong Jae Lee,et al.  AverageExplorer: interactive exploration and alignment of visual data collections , 2014, ACM Trans. Graph..

[80]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Hossein Mobahi,et al.  A Compositional Model for Low-Dimensional Image Set Representation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[83]  Allen R. Hanson,et al.  Unsupervised Joint Alignment and Clustering using Bayesian Nonparametrics , 2012, UAI.

[84]  Ira Kemelmacher-Shlizerman,et al.  Collection flow , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[85]  Alexei A. Efros,et al.  Object Instance Sharing by Enhanced Bounding Box Correspondence , 2012, BMVC.

[86]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[87]  John Wright,et al.  RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[88]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[89]  Erik G. Learned-Miller,et al.  Unsupervised Joint Alignment of Complex Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[90]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[91]  Erik G. Learned-Miller,et al.  Data driven image models through continuous joint alignment , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[92]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[93]  B. Frey,et al.  Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[94]  Brendan J. Frey,et al.  Estimating mixture models of images and inferring spatial transformations using the EM algorithm , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[95]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.