An Internal Learning Approach to Video Inpainting

We propose a novel video inpainting algorithm that simultaneously hallucinates missing appearance and motion (optical flow) information, building upon the recent 'Deep Image Prior' (DIP) that exploits convolutional network architectures to enforce plausible texture in static images. In extending DIP to video we make two important contributions. First, we show that coherent video inpainting is possible without a priori training. We take a generative approach to inpainting based on internal (within-video) learning without reliance upon an external corpus of visual data to train a one-size-fits-all model for the large space of general videos. Second, we show that such a framework can jointly generate both appearance and flow, whilst exploiting these complementary modalities to ensure mutual consistency. We show that leveraging appearance statistics specific to each video achieves visually plausible results whilst handling the challenging problem of long-term consistency.

[1]  Oliver Grau,et al.  How Not to Be Seen — Object Removal from Videos of Crowded Scenes , 2012, Comput. Graph. Forum.

[2]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[3]  Chuan Wang,et al.  Video Inpainting by Jointly Learning Temporal Structure and Spatial Details , 2018, AAAI.

[4]  Bolei Zhou,et al.  Semantic photo manipulation with a generative image prior , 2019, ACM Trans. Graph..

[5]  Alexei A. Efros,et al.  Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Fang-Lue Zhang,et al.  A survey of the state-of-the-art in patch-based synthesis , 2017, Computational Visual Media.

[7]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2008, Commun. ACM.

[10]  Eli Shechtman,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, ACM Trans. Graph..

[11]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Harry Shum,et al.  Image completion with structure propagation , 2005, ACM Trans. Graph..

[13]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[14]  Narendra Ahuja,et al.  Image completion using planar structure guidance , 2014, ACM Trans. Graph..

[15]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Narendra Ahuja,et al.  Temporally coherent completion of dynamic video , 2016, ACM Trans. Graph..

[17]  Patrick Pérez,et al.  Video Inpainting of Complex Scenes , 2014, SIAM J. Imaging Sci..

[18]  Zuowei Shen,et al.  Robust video denoising using low rank matrix completion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Khaled F. Hussain,et al.  Fast Video Completion using patch-based synthesis and image registration , 2014, 2014 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS).

[20]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[21]  Michal Irani,et al.  Super-resolution from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Jian Sun,et al.  Image Completion Approaches Using the Statistics of Similar Patches , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  In So Kweon,et al.  Deep Video Inpainting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Nikos Komodakis,et al.  Image Completion Using Global Optimization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Christine Guillemot,et al.  Hierarchical Super-Resolution-Based Inpainting , 2013, IEEE Transactions on Image Processing.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Wei Wang,et al.  Video Super-Resolution via Bidirectional Recurrent Convolutional Networks , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Li Fei-Fei,et al.  Characterizing and Improving Stability in Neural Style Transfer , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Eli Shechtman,et al.  Improving patch-based synthesis by learning patch masks , 2014, 2014 IEEE International Conference on Computational Photography (ICCP).

[31]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Adithya Reddy Nallabolu Unsupervised Learning of Spatiotemporal Features by Video Completion , 2017 .

[33]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[34]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[35]  Andrea Vedaldi,et al.  Deep Image Prior , 2017, International Journal of Computer Vision.

[36]  Christine Guillemot,et al.  Video Inpainting With Short-Term Windows: Application to Object Removal and Error Concealment , 2015, IEEE Transactions on Image Processing.

[37]  Jan Kautz,et al.  Video-to-Video Synthesis , 2018, NeurIPS.

[38]  Jan Kautz,et al.  Background Inpainting for Videos with Dynamic Objects and a Free-Moving Camera , 2012, ECCV.

[39]  Jan Kautz,et al.  Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Zongben Xu,et al.  Image Inpainting by Patch Propagation Using Patch Sparsity , 2010, IEEE Transactions on Image Processing.

[41]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Yann Gousseau,et al.  Motion-consistent video inpainting , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[43]  Michal Irani,et al.  “Double-DIP”: Unsupervised Image Decomposition via Coupled Deep-Image-Priors , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[45]  Bolei Zhou,et al.  Deep Flow-Guided Video Inpainting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Eli Shechtman,et al.  Space-time video completion , 2004, CVPR 2004.

[47]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Michal Irani,et al.  "Zero-Shot" Super-Resolution Using Deep Internal Learning , 2017, CVPR.

[49]  Min H. Kim,et al.  Laplacian Patch-Based Image Synthesis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).