Interactive video stylization using few-shot patch-based training

In this paper, we present a learning-based method to the keyframe-based video stylization that allows an artist to propagate the style from a few selected keyframes to the rest of the sequence. Its key advantage is that the resulting stylization is semantically meaningful, i.e., specific parts of moving objects are stylized according to the artist's intention. In contrast to previous style transfer techniques, our approach does not require any lengthy pre-training process nor a large training dataset. We demonstrate how to train an appearance translation network from scratch using only a few stylized exemplars while implicitly preserving temporal consistency. This leads to a video stylization framework that supports real-time inference, parallel processing, and random access to an arbitrary output frame. It can also merge the content from multiple keyframes without the need to perform an explicit blending operation. We demonstrate its practical utility in various interactive scenarios, where the user paints over a selected keyframe and sees her style transferred to an existing recorded sequence or a live video stream.

[1]  Jan Kautz,et al.  Few-shot Video-to-Video Synthesis , 2019, NeurIPS.

[2]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Neus Sabater,et al.  Video style transfer by consistent adaptive patch sampling , 2018, The Visual Computer.

[5]  Alexei A. Efros,et al.  Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Tali Dekel,et al.  SinGAN: Learning a Generative Model From a Single Natural Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Björn Ommer,et al.  Content and Style Disentanglement for Artistic Style Transfer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  David Salesin,et al.  Image Analogies , 2001, SIGGRAPH.

[9]  Björn Ommer,et al.  A Style-Aware Content Loss for Real-time HD Style Transfer , 2018, ECCV.

[10]  Thomas Brox,et al.  Artistic Style Transfer for Videos and Spherical Images , 2017, International Journal of Computer Vision.

[11]  Michal Irani,et al.  InGAN: Capturing and Retargeting the “DNA” of a Natural Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Shai Bagon,et al.  InGAN: Capturing and Remapping the "DNA" of a Natural Image , 2018 .

[13]  Ersin Yumer,et al.  Learning Blind Video Temporal Consistency , 2018, ECCV.

[14]  Michal Irani,et al.  "Zero-Shot" Super-Resolution Using Deep Internal Learning , 2017, CVPR.

[15]  Pierre Bénard,et al.  Stylizing animation by example , 2013, ACM Trans. Graph..

[16]  Eli Shechtman,et al.  Regenerative morphing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Jaakko Lehtinen,et al.  Few-Shot Unsupervised Image-to-Image Translation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Jan Kautz,et al.  Video-to-Video Synthesis , 2018, NeurIPS.

[19]  Eli Shechtman,et al.  Stylizing video by example , 2019, ACM Trans. Graph..

[20]  Eli Shechtman,et al.  Example-based synthesis of stylized facial animations , 2017, ACM Trans. Graph..

[21]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[22]  Shi-Min Hu,et al.  Example-Guided Style-Consistent Image Synthesis From Semantic Labeling , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Andrea Vedaldi,et al.  Texture Networks: Feed-forward Synthesis of Textures and Stylized Images , 2016, ICML.

[24]  Allan Jabri,et al.  Learning Correspondence From the Cycle-Consistency of Time , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Dani Lischinski,et al.  Non-stationary texture synthesis by adversarial expansion , 2018, ACM Trans. Graph..

[26]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Xueting Li,et al.  Joint-task Self-supervised Learning for Temporal Correspondence , 2019, NeurIPS.

[28]  Eli Shechtman,et al.  StyLit , 2016, ACM Trans. Graph..

[29]  Chuan Li,et al.  Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks , 2016, ECCV.

[30]  Deva Ramanan,et al.  Online Model Distillation for Efficient Video Inference , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Yotam I. Gingold,et al.  Color Me Noisy: Example‐based Rendering of Hand‐colored Animations with Temporal Noise Control , 2014, Comput. Graph. Forum.

[32]  Sergey Tulyakov,et al.  Real-time patch-based stylization of portraits using generative adversarial network , 2019, Expressive.

[33]  John Dingliana,et al.  As-rigid-as-possible image registration for hand-drawn cartoon animations , 2009, NPAR '09.

[34]  Nenghai Yu,et al.  Coherent Online Video Style Transfer , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Ying Wu,et al.  Large Displacement Optical Flow from Nearest Neighbor Fields , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Leonard McMillan,et al.  Video enhancement using per-pixel virtual exposures , 2005, ACM Trans. Graph..