Deep Translation Prior: Test-time Training for Photorealistic Style Transfer

Recent techniques to solve photorealistic style transfer within deep convolutional neural networks (CNNs) generally require intensive training from large-scale datasets, thus having limited applicability and poor generalization ability to unseen images or styles. To overcome this, we propose a novel framework, dubbed Deep Translation Prior (DTP), to accomplish photorealistic style transfer through test-time training on given input image pair with untrained networks, which learns an image pair-specific translation prior and thus yields better performance and generalization. Tailored for such test-time training for style transfer, we present novel network architectures, with two sub-modules of correspondence and generation modules, and loss functions consisting of contrastive content, style, and cycle consistency losses. Our framework does not require offline training phase for style transfer, which has been one of the main challenges in existing methods, but the networks are to be solely learned during test-time. Experimental results prove that our framework has a better generalization ability to unseen image pairs and even outperforms the state-of-the-art methods.

[1]  Josef Sivic,et al.  Convolutional Neural Network Architecture for Geometric Matching , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Zhenzhou Shao,et al.  One-Shot Mutual Affine-Transfer for Photorealistic Stylization , 2019, ArXiv.

[3]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Cynthia Rudin,et al.  PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Mark W. Schmidt,et al.  Fast Patch-based Style Transfer of Arbitrary Style , 2016, ArXiv.

[6]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Phillip Isola,et al.  On the "steerability" of generative adversarial networks , 2019, ICLR.

[10]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[11]  Lu Yuan,et al.  Cross-Domain Correspondence Learning for Exemplar-Based Image Translation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Kwang Hee Lee,et al.  Arbitrary Style Transfer With Style-Attentional Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Lihi Zelnik-Manor,et al.  The Contextual Loss for Image Transformation with Non-Aligned Data , 2018, ECCV.

[14]  Andrea Vedaldi,et al.  Deep Image Prior , 2017, International Journal of Computer Vision.

[15]  Jing Liao,et al.  Arbitrary Style Transfer with Deep Feature Reshuffle , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Tali Dekel,et al.  SinGAN: Learning a Generative Model From a Single Natural Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Andrea Vedaldi,et al.  Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Chuan Li,et al.  Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Guy Gilboa,et al.  Nonlinear Inverse Scale Space Methods for Image Restoration , 2005, VLSM.

[21]  Alexei A. Efros,et al.  Contrastive Learning for Unpaired Image-to-Image Translation , 2020, ECCV.

[22]  Gregory Shakhnarovich,et al.  Style Transfer by Relaxed Optimal Transport and Self-Similarity , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  David Salesin,et al.  Image Analogies , 2001, SIGGRAPH.

[24]  Jung-Woo Ha,et al.  Photorealistic Style Transfer via Wavelet Transforms , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[26]  Alessandro Foi,et al.  Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering , 2007, IEEE Transactions on Image Processing.

[27]  Qifeng Chen,et al.  SinIR: Efficient General Image Manipulation with Single Image Reconstruction , 2021, ICML.

[28]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[29]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Alexei A. Efros,et al.  Swapping Autoencoder for Deep Image Manipulation , 2020, NeurIPS.

[31]  Björn Ommer,et al.  A Style-Aware Content Loss for Real-time HD Style Transfer , 2018, ECCV.

[32]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[33]  Irina Rish,et al.  Towards Lifelong Self-Supervision For Unpaired Image-to-Image Translation , 2020, ArXiv.

[34]  Errui Ding,et al.  AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Sylvain Paris,et al.  Deep Photo Style Transfer , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[37]  Gang Hua,et al.  Visual attribute transfer through deep image analogy , 2017, ACM Trans. Graph..

[38]  Stefan Harmeling,et al.  Image denoising: Can plain neural networks compete with BM3D? , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Xueting Li,et al.  A Closed-form Solution to Photorealistic Image Stylization , 2018, ECCV.

[40]  Roland Memisevic,et al.  Incorporating long-range consistency in CNN-based texture generation , 2016, ICLR.

[41]  Jiebo Luo,et al.  Ultrafast Photorealistic Style Transfer via Neural Architecture Search , 2019, AAAI.

[42]  Ming-Hsuan Yang,et al.  Universal Style Transfer via Feature Transforms , 2017, NIPS.

[43]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[44]  Frédo Durand,et al.  Style transfer for headshot portraits , 2014, ACM Trans. Graph..

[45]  Seungryong Kim,et al.  Semantic Attribute Matching Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Hong Cai,et al.  PieAPP: Perceptual Image-Error Assessment Through Pairwise Preference , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Duanqing Xu,et al.  Optimal Transport of Deep Feature for Image Style Transfer , 2019, ICMSSP 2019.

[49]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[50]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.