Stable Video Style Transfer Based on Partial Convolution with Depth-Aware Supervision

As a very important research issue in digital media art, neural learning based video style transfer has attracted more and more attention. A lot of recent works import optical flow method to original image style transfer framework to preserve frame-coherency and prevent flicker. However, these methods highly rely on paired video datasets of content video and stylized video, which are often difficult to obtain. Another limitation of existing methods is that while maintaining inter-frame coherency, they will introduce strong ghosting artifacts. In order to address these problems, this paper has following contributions: (1).presents a novel training framework for video style transfer without dependency on video dataset of target style; (2).firstly focuses on the ghosting problem existing in most previous works and uses partial convolution-based strategy to utilize inter-frame context and correlation, together with additional depth loss as a constrain to the generated frames to suppress ghosting artifacts and preserve stability at the same time. Extensive experiments demonstrate that our method can produce natural and stable video frames with target style. Qualitative and quantitative comparisons also show that the proposed approach outperforms previous works in terms of overall image quality and inter-frame stability. To facilitate future research, we publish our experiment code at \urlhttps://github.com/Huage001/Artistic-Video-Partial-Conv-Depth-Loss.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Thomas Brox,et al.  Artistic Style Transfer for Videos and Spherical Images , 2017, International Journal of Computer Vision.

[3]  Li Fei-Fei,et al.  Characterizing and Improving Stability in Neural Style Transfer , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[5]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Jan Kautz,et al.  Learning Linear Transformations for Fast Image and Video Style Transfer , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Kate Saenko,et al.  Unsupervised Video-to-Video Translation , 2018, ArXiv.

[9]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Nenghai Yu,et al.  Coherent Online Video Style Transfer , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Tao Mei,et al.  Mocycle-GAN: Unpaired Video-to-Video Translation , 2019, ACM Multimedia.

[12]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Vladlen Koltun,et al.  Playing for Benchmarks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Luc Van Gool,et al.  Exemplar Guided Unsupervised Image-to-Image Translation with Semantic Consistency , 2018, ICLR.

[16]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[17]  Ting-Chun Wang,et al.  Partial Convolution based Padding , 2018, ArXiv.

[18]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[19]  Yu-Kun Lai,et al.  Depth-aware neural style transfer , 2017, NPAR '17.

[20]  Hao Wang,et al.  Real-Time Neural Style Transfer for Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[22]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[23]  Yizhou Yu,et al.  ReCoNet: Real-time Coherent Video Style Transfer Network , 2018, ACCV.

[24]  Nenghai Yu,et al.  StyleBank: An Explicit Representation for Neural Image Style Transfer , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Kurt Keutzer,et al.  Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow , 2010, ECCV.

[27]  Steven Argula Oz the great and powerful , 2013, SIGGRAPH Computer Animation Festival.

[28]  Weifeng Chen,et al.  Single-Image Depth Perception in the Wild , 2016, NIPS.

[29]  Yaser Sheikh,et al.  Recycle-GAN: Unsupervised Video Retargeting , 2018, ECCV.

[30]  Jan Kautz,et al.  Video-to-Video Synthesis , 2018, NeurIPS.