Real-time Localized Photorealistic Video Style Transfer

We present a novel algorithm for transferring artistic styles of semantically meaningful local regions of an image onto local regions of a target video while preserving its photorealism. Local regions may be selected either fully automatically from an image, through using video segmentation algorithms, or from casual user guidance such as scribbles. Our method, based on a deep neural network architecture inspired by recent work in photorealistic style transfer, is real-time and works on arbitrary inputs without runtime optimization once trained on a diverse dataset of artistic styles. By augmenting our video dataset with noisy semantic labels and jointly optimizing over style, content, mask, and temporal losses, our method can cope with a variety of imperfections in the input and produce temporally coherent videos without visual artifacts. We demonstrate our method on a variety of style images and target videos, including the ability to transfer different styles onto multiple objects simultaneously, and smoothly transition between styles in time.

[1]  Xing Mei,et al.  Content‐Based Colour Transfer , 2013, Comput. Graph. Forum.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Nenghai Yu,et al.  Coherent Online Video Style Transfer , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Xueting Li,et al.  A Closed-form Solution to Photorealistic Image Stylization , 2018, ECCV.

[5]  Xiaogang Wang,et al.  Avatar-Net: Multi-scale Zero-Shot Style Transfer by Feature Decoration , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Meng Zhang,et al.  Joint Bilateral Learning for Real-time Universal Photorealistic Style Transfer , 2020, ECCV.

[7]  Hao Wang,et al.  Real-Time Neural Style Transfer for Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Jan Kautz,et al.  Learning Affinity via Spatial Propagation Networks , 2017, NIPS.

[10]  Jonathan T. Barron,et al.  Deep bilateral learning for real-time image enhancement , 2017, ACM Trans. Graph..

[11]  Ersin Yumer,et al.  Learning Blind Video Temporal Consistency , 2018, ECCV.

[12]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Sylvain Paris,et al.  Deep Photo Style Transfer , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jiebo Luo,et al.  Ultrafast Photorealistic Style Transfer via Neural Architecture Search , 2019, AAAI.

[15]  Frédo Durand,et al.  A Fast Approximation of the Bilateral Filter Using a Signal Processing Approach , 2006, International Journal of Computer Vision.

[16]  Jiaya Jia,et al.  Deep Automatic Portrait Matting , 2016, ECCV.

[17]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[18]  Frédo Durand,et al.  Style transfer for headshot portraits , 2014, ACM Trans. Graph..

[19]  Olga Sorkine-Hornung,et al.  A comparative study of image retargeting , 2010, ACM Trans. Graph..

[20]  Xiaofeng Tao,et al.  Transient attributes for high-level understanding and editing of outdoor scenes , 2014, ACM Trans. Graph..

[21]  Thomas Brox,et al.  Artistic Style Transfer for Videos , 2016, GCPR.

[22]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jan Kautz,et al.  Learning Linear Transformations for Fast Image and Video Style Transfer , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xinghao Chen,et al.  Optical Flow Distillation: Towards Efficient and Stable Video Style Transfer , 2020, ECCV.

[25]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Sergey Tulyakov,et al.  Interactive video stylization using few-shot patch-based training , 2020, ACM Trans. Graph..

[27]  Ling Shao,et al.  See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jung-Woo Ha,et al.  Photorealistic Style Transfer via Wavelet Transforms , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[31]  Chuan Li,et al.  Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks , 2016, ECCV.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[34]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[35]  Dani Lischinski,et al.  A Closed-Form Solution to Natural Image Matting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Ying Li,et al.  Exploiting Temporal Consistency for Real-Time Video Depth Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Erik Reinhard,et al.  Color Transfer between Images , 2001, IEEE Computer Graphics and Applications.

[38]  Ming-Hsuan Yang,et al.  Universal Style Transfer via Feature Transforms , 2017, NIPS.

[39]  Frédo Durand,et al.  Data-driven hallucination of different times of day from a single outdoor photo , 2013, ACM Trans. Graph..

[40]  Sylvain Paris,et al.  Blind video temporal consistency , 2015, ACM Trans. Graph..

[41]  Narendra Ahuja,et al.  A Comparative Study for Single Image Blind Deblurring , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Ning Xu,et al.  Deep Image Matting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Li Fei-Fei,et al.  Characterizing and Improving Stability in Neural Style Transfer , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Jiawen Chen,et al.  Bilateral guided upsampling , 2016, ACM Trans. Graph..

[45]  A.C. Kokaram,et al.  N-dimensional probability density function transfer and its application to color transfer , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.