COCO-GAN: Generation by Parts via Conditional Coordinating

Humans can only interact with part of the surrounding environment due to biological restrictions. Therefore, we learn to reason the spatial relationships across a series of observations to piece together the surrounding environment. Inspired by such behavior and the fact that machines also have computational constraints, we propose COnditional COordinate GAN (COCO-GAN) of which the generator generates images by parts based on their spatial coordinates as the condition. On the other hand, the discriminator learns to justify realism across multiple assembled patches by global coherence, local appearance, and edge-crossing continuity. Despite the full images are never manipulated during training, we show that COCO-GAN can produce state-of-the-art-quality full images during inference. We further demonstrate a variety of novel applications enabled by our coordinate-aware framework. First, we perform extrapolation to the learned coordinate manifold and generate off-the-boundary patches. Combining with the originally generated full image, COCO-GAN can produce images that are larger than training samples, which we called "beyond-boundary generation". We then showcase panorama generation within a cylindrical coordinate system that inherently preserves horizontally cyclic topology. On the computation side, COCO-GAN has a built-in divide-and-conquer paradigm that reduces memory requisition during training and inference, provides high-parallelism, and can generate parts of images on-demand.

[1]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[2]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[3]  Takeru Miyato,et al.  cGANs with Projection Discriminator , 2018, ICLR.

[4]  Minh N. Do,et al.  Semantic Image Inpainting with Deep Generative Models , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Hugo Larochelle,et al.  Modulating early visual processing by language , 2017, NIPS.

[6]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[7]  Aljoscha Smolic,et al.  Viewport-aware adaptive 360° video streaming using tiles for virtual reality , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[8]  Tom White,et al.  Sampling Generative Networks: Notes on a Few Effective Techniques , 2016, ArXiv.

[9]  Gwendal Simon,et al.  Viewport-adaptive navigable 360-degree video delivery , 2016, 2017 IEEE International Conference on Communications (ICC).

[10]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[12]  Min Sun,et al.  Cube Padding for Weakly-Supervised Saliency Prediction in 360° Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Jonathon Shlens,et al.  A Learned Representation For Artistic Style , 2016, ICLR.

[14]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[15]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[16]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[17]  Cheng Sun,et al.  HorizonNet: Learning Room Layout With 1D Representation and Pano Stretch Data Augmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[19]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[22]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[23]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[24]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[25]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Gwendal Simon,et al.  Optimal Set of 360-Degree Videos for Viewport-Adaptive Streaming , 2017, ACM Multimedia.

[27]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[28]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[30]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[32]  Tieniu Tan,et al.  IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis , 2018, NeurIPS.

[33]  Che-Rung Lee,et al.  Escaping from Collapsing Modes in a Constrained Space , 2018, ECCV.

[34]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[35]  Gili Rusak,et al.  Painting Outside the Box: Image Outpainting with GANs , 2018, ArXiv.

[36]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.