Towards an Understanding of Our World by GANing Videos in the Wild

Existing generative video models work well only for videos with a static background. For dynamic scenes, applications of these models demand an extra pre-processing step of background stabilization. In fact, the task of background stabilization may very often prove impossible for videos in the wild. To the best of our knowledge, we present the first video generation framework that works in the wild, without making any assumption on the videos' content. This allows us to avoid the background stabilization step, completely. The proposed method also outperforms the state-of-the-art methods even when the static background assumption is valid. This is achieved by designing a robust one-stream video generation architecture by exploiting Wasserstein GAN frameworks for better convergence. Since the proposed architecture is one-stream, which does not formally distinguish between fore- and background, it can generate - and learn from - videos with dynamic backgrounds. The superiority of our model is demonstrated by successfully applying it to three challenging problems: video colorization, video inpainting, and future prediction.

[1]  Christine Guillemot,et al.  Video Inpainting With Short-Term Windows: Application to Object Removal and Error Concealment , 2015, IEEE Transactions on Image Processing.

[2]  Yoshua Bengio,et al.  Mode Regularized Generative Adversarial Networks , 2016, ICLR.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[5]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[6]  Sung Yong Shin,et al.  On pixel-based texture synthesis by non-parametric sampling , 2006, Comput. Graph..

[7]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[8]  Stephen Lin,et al.  Intrinsic colorization , 2008, ACM Trans. Graph..

[9]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Kai-Kuang Ma,et al.  Tri-state median filter for image denoising , 1999, IEEE Trans. Image Process..

[11]  Stephen Lin,et al.  Semantic colorization with internet images , 2011, ACM Trans. Graph..

[12]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[13]  Guillermo Sapiro,et al.  Fast image and video colorization using chrominance blending , 2006, IEEE Transactions on Image Processing.

[14]  Xin Pan,et al.  YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Li Fei-Fei,et al.  Unsupervised Learning of Long-Term Motion Dynamics for Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Deepu Rajan,et al.  A learning-based approach for automatic image and video colorization , 2017, ArXiv.

[18]  Stephen Koo,et al.  Automatic Colorization with Deep Convolutional Generative Adversarial Networks , 2016 .

[19]  Timothy K. Shih,et al.  Video Inpainting on Digitized Vintage Films via Maintaining Spatiotemporal Continuity , 2011, IEEE Transactions on Multimedia.

[20]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[21]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[22]  Jitendra Malik,et al.  Recurrent Network Models for Kinematic Tracking , 2015, ArXiv.

[23]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[24]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[25]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[26]  Marc'Aurelio Ranzato,et al.  Video (language) modeling: a baseline for generative models of natural videos , 2014, ArXiv.

[27]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Bernhard Schölkopf,et al.  Automatic Image Colorization Via Multimodal Predictions , 2008, ECCV.

[29]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Klaus Mueller,et al.  Transferring color to greyscale images , 2002, ACM Trans. Graph..

[31]  Adam Finkelstein,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[32]  Harry Shum,et al.  Natural Image Colorization , 2007, Rendering Techniques.

[33]  Nikos Komodakis,et al.  Image Completion Using Global Optimization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[35]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[36]  Scott Cohen,et al.  Forecasting Human Dynamics from Static Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Hui Jiang,et al.  Generating images with recurrent adversarial networks , 2016, ArXiv.

[38]  Frédéric Jurie,et al.  A new log-polar mapping for space variant imaging.: Application to face detection and tracking , 1999, Pattern Recognit..

[39]  Alex Graves,et al.  Video Pixel Networks , 2016, ICML.

[40]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[41]  Yann Gousseau,et al.  Motion-consistent video inpainting , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[42]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[43]  Jan Kautz,et al.  MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[45]  Guillermo Sapiro,et al.  Image inpainting , 2000, SIGGRAPH.

[46]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[47]  Dani Lischinski,et al.  Colorization using optimization , 2004, ACM Trans. Graph..

[48]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[50]  Jun-Cheng Chen,et al.  An adaptive edge detection based colorization algorithm and its applications , 2005, ACM Multimedia.

[51]  David Salesin,et al.  Image Analogies , 2001, SIGGRAPH.

[52]  Jan Kautz,et al.  Background Inpainting for Videos with Dynamic Objects and a Free-Moving Camera , 2012, ECCV.

[53]  Dani Lischinski,et al.  Colorization by example , 2005, EGSR '05.

[54]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[55]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[56]  Ching-Ta Lu,et al.  Denoising of salt-and-pepper noise corrupted image using modified directional-weighted-median filter , 2012, Pattern Recognit. Lett..

[57]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[58]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[59]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[60]  Antonio Torralba,et al.  Generating the Future with Adversarial Transformers , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).