Improving Video Generation for Multi-functional Applications.

In this paper, we aim to improve the state-of-the-art video generative adversarial networks (GANs) with a view towards multi-functional applications. Our improved video GAN model does not separate foreground from background nor dynamic from static patterns, but learns to generate the entire video clip conjointly. Our model can thus be trained to generate - and learn from - a broad set of videos with no restriction. This is achieved by designing a robust one-stream video generation architecture with an extension of the state-of-the-art Wasserstein GAN framework that allows for better convergence. The experimental results show that our improved video GAN model outperforms state-of-theart video generative models on multiple challenging datasets. Furthermore, we demonstrate the superiority of our model by successfully extending it to three challenging problems: video colorization, video inpainting, and future prediction. To the best of our knowledge, this is the first work using GANs to colorize and inpaint video clips.

[1]  Ching-Ta Lu,et al.  Denoising of salt-and-pepper noise corrupted image using modified directional-weighted-median filter , 2012, Pattern Recognit. Lett..

[2]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Antonio Torralba,et al.  Generating the Future with Adversarial Transformers , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Frédéric Jurie,et al.  A new log-polar mapping for space variant imaging.: Application to face detection and tracking , 1999, Pattern Recognit..

[5]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Stephen Lin,et al.  Intrinsic colorization , 2008, ACM Trans. Graph..

[7]  Jiajun Wu,et al.  Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.

[8]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Shunta Saito,et al.  Temporal Generative Adversarial Nets with Singular Value Clipping , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[14]  Martial Hebert,et al.  An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders , 2016, ECCV.

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[17]  Jan Kautz,et al.  Background Inpainting for Videos with Dynamic Objects and a Free-Moving Camera , 2012, ECCV.

[18]  Christine Guillemot,et al.  Video Inpainting With Short-Term Windows: Application to Object Removal and Error Concealment , 2015, IEEE Transactions on Image Processing.

[19]  Harry Shum,et al.  Natural Image Colorization , 2007, Rendering Techniques.

[20]  Martial Hebert,et al.  Patch to the Future: Unsupervised Visual Prediction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[22]  Nikos Komodakis,et al.  Image Completion Using Global Optimization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Guillermo Sapiro,et al.  Fast image and video colorization using chrominance blending , 2006, IEEE Transactions on Image Processing.

[24]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Dani Lischinski,et al.  Colorization using optimization , 2004, ACM Trans. Graph..

[26]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[28]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[29]  Marc'Aurelio Ranzato,et al.  Video (language) modeling: a baseline for generative models of natural videos , 2014, ArXiv.

[30]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[31]  Dani Lischinski,et al.  Colorization by example , 2005, EGSR '05.

[32]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[34]  Jun-Cheng Chen,et al.  An adaptive edge detection based colorization algorithm and its applications , 2005, ACM Multimedia.

[35]  David Salesin,et al.  Image Analogies , 2001, SIGGRAPH.

[36]  Deepu Rajan,et al.  A learning-based approach for automatic image and video colorization , 2017, ArXiv.

[37]  Stephen Koo,et al.  Automatic Colorization with Deep Convolutional Generative Adversarial Networks , 2016 .

[38]  Kai-Kuang Ma,et al.  Tri-state median filter for image denoising , 1999, IEEE Trans. Image Process..

[39]  Stephen Lin,et al.  Semantic colorization with internet images , 2011, ACM Trans. Graph..

[40]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[41]  Xin Pan,et al.  YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Li Fei-Fei,et al.  Unsupervised Learning of Long-Term Motion Dynamics for Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Hui Jiang,et al.  Generating images with recurrent adversarial networks , 2016, ArXiv.

[44]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[45]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[46]  Bernhard Schölkopf,et al.  Automatic Image Colorization Via Multimodal Predictions , 2008, ECCV.

[47]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[48]  Adam Finkelstein,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[49]  Martial Hebert,et al.  Dense Optical Flow Prediction from a Static Image , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  Yann Gousseau,et al.  Motion-consistent video inpainting , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[51]  Guillermo Sapiro,et al.  Image inpainting , 2000, SIGGRAPH.

[52]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[53]  Scott Cohen,et al.  Forecasting Human Dynamics from Static Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Alex Graves,et al.  Video Pixel Networks , 2016, ICML.

[55]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[56]  Yoshua Bengio,et al.  Mode Regularized Generative Adversarial Networks , 2016, ICLR.

[57]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[58]  Sung Yong Shin,et al.  On pixel-based texture synthesis by non-parametric sampling , 2006, Comput. Graph..

[59]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[60]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[61]  Timothy K. Shih,et al.  Video Inpainting on Digitized Vintage Films via Maintaining Spatiotemporal Continuity , 2011, IEEE Transactions on Multimedia.

[62]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[63]  Klaus Mueller,et al.  Transferring color to greyscale images , 2002, ACM Trans. Graph..

[64]  Jitendra Malik,et al.  Recurrent Network Models for Kinematic Tracking , 2015, ArXiv.

[65]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[66]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[67]  Jan Kautz,et al.  MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[68]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.