TSIT: A Simple and Versatile Framework for Image-to-Image Translation

We introduce a simple and versatile framework for image-to-image translation. We unearth the importance of normalization layers, and provide a carefully designed two-stream generative model with newly proposed feature transformations in a coarse-to-fine fashion. This allows multi-scale semantic structure information and style representation to be effectively captured and fused by the network, permitting our method to scale to various tasks in both unsupervised and supervised settings. No additional constraints (e.g., cycle consistency) are needed, contributing to a very clean and simple method. Multi-modal image synthesis with arbitrary style control is made possible. A systematic study compares the proposed method with several state-of-the-art task-specific baselines, verifying its effectiveness in both perceptual quality and quantitative evaluations.

[1]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[3]  Ning Xu,et al.  Controllable Artistic Text Style Transfer via Shape-Matching GAN , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Tai-Yin Chiu,et al.  Understanding Generalized Whitening and Coloring Transform for Universal Style Transfer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Lior Wolf,et al.  Unsupervised Cross-Domain Image Generation , 2016, ICLR.

[8]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[11]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ping Tan,et al.  DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[15]  Nenghai Yu,et al.  StyleBank: An Explicit Representation for Neural Image Style Transfer , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[17]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[18]  Xing Cai,et al.  SingleGAN: Image-to-Image Translation by a Single-Generator Network using Multiple Generative Adversarial Learning , 2018, ACCV.

[19]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[20]  Vladlen Koltun,et al.  Semi-Parametric Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xiaogang Wang,et al.  Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis , 2019, NeurIPS.

[24]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Björn Ommer,et al.  Content and Style Disentanglement for Artistic Style Transfer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[29]  Jung-Woo Ha,et al.  Photorealistic Style Transfer via Wavelet Transforms , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Xiaoming Yu,et al.  Multi-mapping Image-to-Image Translation via Learning Disentanglement , 2019, NeurIPS.

[31]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[32]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[33]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jae Hyun Lim,et al.  Geometric GAN , 2017, ArXiv.

[35]  Yang Zhou,et al.  ETNet: Error Transition Network for Arbitrary Style Transfer , 2019, NeurIPS.

[36]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[37]  Yu-Chiang Frank Wang,et al.  A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation , 2018, NeurIPS.

[38]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[39]  Yun Fu,et al.  Multimodal Style Transfer via Graph Cuts , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[41]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[42]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Feng Xu,et al.  A Closed-Form Solution to Universal Style Transfer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[45]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[46]  Jonathon Shlens,et al.  A Learned Representation For Artistic Style , 2016, ICLR.

[47]  Yuning Jiang,et al.  Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.

[48]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[52]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.