DeepLandscape: Adversarial Modeling of Landscape Videos

We build a new model of landscape videos that can be trained on a mixture of static landscape images as well as landscape animations. Our architecture extends StyleGAN model by augmenting it with parts that allow to model dynamic changes in a scene. Once trained, our model can be used to generate realistic time-lapse landscape videos with moving objects and time-of-the-day changes. Furthermore, by fitting the learned models to a static landscape image, the latter can be reenacted in a realistic way. We propose simple but necessary modifications to StyleGAN inversion procedure, which lead to in-domain latent codes and allow to manipulate real images. Quantitative comparisons and user studies suggest that our model produces more compelling animations of given photographs than previously proposed methods. The results of our approach including comparisons with prior art can be seen in supplementary materials and on the project page https://saic-mdal.github.io/deeplandscape/.

[1]  Bolei Zhou,et al.  Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[2]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[3]  Wenmin Wang,et al.  Video Imagination from a Single Image with Transformation Generation , 2017, ACM Multimedia.

[4]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[5]  Seunghoon Hong,et al.  Decomposing Motion and Content for Natural Video Sequence Prediction , 2017, ICLR.

[6]  Andrew Zisserman,et al.  A Short Note about Kinetics-600 , 2018, ArXiv.

[7]  Wei Xiong,et al.  Learning to Generate Time-Lapse Videos Using Multi-stage Dynamic Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Jan Kautz,et al.  MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Ming-Hsuan Yang,et al.  Flow-Grounded Spatial-Temporal Video Prediction from Still Images , 2018, ECCV.

[10]  Vighnesh Birodkar,et al.  Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[11]  Jeff Donahue,et al.  Efficient Video Generation on Complex Datasets , 2019, ArXiv.

[12]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[13]  Marco Tagliasacchi,et al.  From Here to There: Video Inbetweening Using Direct 3D Convolutions , 2019, ArXiv.

[14]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jan Kautz,et al.  Video-to-Video Synthesis , 2018, NeurIPS.

[16]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[17]  Lars M. Mescheder,et al.  On the convergence properties of GAN training , 2018, ArXiv.

[18]  Yu Qiao,et al.  ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks , 2018, ECCV Workshops.

[19]  Konstantinos G. Derpanis,et al.  Two-Stream Convolutional Networks for Dynamic Texture Synthesis , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[21]  Kaiqi Huang,et al.  Fast End-to-End Trainable Guided Filter , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Victor Lempitsky,et al.  Few-Shot Adversarial Learning of Realistic Neural Talking Head Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Sjoerd van Steenkiste,et al.  Towards Accurate Generative Models of Video: A New Metric & Challenges , 2018, ArXiv.

[26]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[27]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[28]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[29]  Xiaogang Wang,et al.  Video Generation From Single Semantic Label Map , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  David Salesin,et al.  Animating pictures with stochastic motion textures , 2005, ACM Trans. Graph..

[31]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[32]  Marc'Aurelio Ranzato,et al.  Transformation-Based Models of Video Sequences , 2017, ArXiv.

[33]  Harry Shum,et al.  Synthesizing Dynamic Texture with Closed-Loop Linear Dynamic System , 2004, ECCV.

[34]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[36]  Bolei Zhou,et al.  Semantic photo manipulation with a generative image prior , 2019, ACM Trans. Graph..

[37]  Marco Körner,et al.  FutureGAN: Anticipating the Future Frames of Video Sequences using Spatio-Temporal 3d Convolutions in Progressively Growing Autoencoder GANs , 2018, ArXiv.

[38]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Yoshihiro Kanamori,et al.  Animating Landscape: Self-Supervised Learning of Decoupled Motion and Appearance for Single-Image Video Synthesis , 2019 .

[40]  Tali Dekel,et al.  SinGAN: Learning a Generative Model From a Single Natural Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Peter Wonka,et al.  Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.