Deep Interactive Video Inpainting: An Invisibility Cloak for Harry Potter

In this paper, we propose a new task of deep interactive video inpainting and an application for users to interact with machines. To our best knowledge, this is the first deep learning-based interactive video inpainting framework that only uses a free form of user input as guidance (i.e. scribbles) instead of mask annotations, which has academic, entertainment, and commercial value. With users' scribbles on a certain frame, it simultaneously performs interactive video object segmentation and video inpainting throughout the whole video. To achieve this, we utilize a shared spatial-temporal memory module, which combines both segmentation and inpainting into an end-to-end pipeline. In our framework, the past frames with object masks (either the users' scribbles or the predicted masks) constitute an external memory, and the current frame as the query is segmented and inpainted by reading the visual cues stored in that memory. Furthermore, our method allows users to iteratively refine the segmentation results, which effectively improves the inpainting performance with frames where inferior segmentation results are witnessed. Hence, one could obtain high-quality video inpainting results even with challenging video sequences. Qualitative and quantitative experimental results demonstrate the superiority of our approach.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Baining Guo,et al.  Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Seoung Wug Oh,et al.  Copy-and-Paste Networks for Deep Video Inpainting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Narendra Ahuja,et al.  Temporally coherent completion of dynamic video , 2016, ACM Trans. Graph..

[6]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[7]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bolei Zhou,et al.  Deep Flow-Guided Video Inpainting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Seoung Wug Oh,et al.  Onion-Peel Networks for Deep Video Completion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Ning Xu,et al.  Video Object Segmentation Using Space-Time Memory Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Thomas Brox,et al.  Video Segmentation with Just a Few Strokes , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Ning Xu,et al.  Fast User-Guided Video Object Segmentation by Interaction-And-Propagation Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Hongyang Chao,et al.  Learning Joint Spatial-Temporal Transformations for Video Inpainting , 2020, ECCV.

[14]  Weimin Tan,et al.  MMFL: Multimodal Fusion Learning for Text-Guided Image Inpainting , 2020, ACM Multimedia.

[15]  Wei Liu,et al.  CNN in MRF: Video Object Segmentation via Inference in a CNN-Based Higher-Order Spatio-Temporal MRF , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Chang-Su Kim,et al.  Interactive Video Object Segmentation Using Global and Local Transfer Modules , 2020, ECCV.

[17]  Wolfgang Broll,et al.  High-Quality Real-Time Video Inpaintingwith PixMix , 2014, IEEE Transactions on Visualization and Computer Graphics.

[18]  Wenyu Liu,et al.  Skeleton Pruning by Contour Partitioning with Discrete Curve Evolution , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Yunchao Wei,et al.  Collaborative Video Object Segmentation by Foreground-Background Integration , 2020, ECCV.

[20]  K.-K. Maninis,et al.  Video Object Segmentation without Temporal Information , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Luc Van Gool,et al.  Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Luca Bertinetto,et al.  Anchor Diffusion for Unsupervised Video Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Maneesh Agrawala,et al.  Interactive video cutout , 2005, SIGGRAPH 2005.

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Yunchao Wei,et al.  Memory Aggregation Networks for Efficient Interactive Video Object Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Naokazu Yokoya,et al.  Diminished Reality Based on Image Inpainting Considering Background Geometry , 2016, IEEE Transactions on Visualization and Computer Graphics.

[27]  Guillermo Sapiro,et al.  A Geodesic Framework for Fast Interactive Image and Video Segmentation and Matting , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  Chuan Wang,et al.  Video Inpainting by Jointly Learning Temporal Structure and Spatial Details , 2018, AAAI.

[29]  Yang Liu,et al.  BFBox: Searching Face-Appropriate Backbone and Feature Pyramid Network for Face Detector , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Baotian Hu,et al.  Text-Guided Neural Image Inpainting , 2020, ACM Multimedia.

[31]  Q. Wang,et al.  Video Segmentation by Detection for the 2019 Unsupervised DAVIS Challenge , 2019 .

[32]  Chen Wang,et al.  Image Inpainting Based on Multi-frequency Probabilistic Inference Model , 2020, ACM Multimedia.

[33]  Bastian Leibe,et al.  Online Adaptation of Convolutional Neural Networks for Video Object Segmentation , 2017, BMVC.

[34]  Baining Guo,et al.  Learning Texture Transformer Network for Image Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Thomas S. Huang,et al.  Free-Form Image Inpainting With Gated Convolution , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Scott Cohen,et al.  LIVEcut: Learning-based interactive video segmentation by evaluation of multiple propagated cues , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Bernt Schiele,et al.  Learning Video Object Segmentation from Static Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Maneesh Agrawala,et al.  Interactive video cutout , 2005, ACM Trans. Graph..

[39]  Qiang Zhou,et al.  Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation , 2019, ArXiv.

[40]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Zhibin Hong,et al.  Learning Global Structure Consistency for Robust Object Tracking , 2020, ACM Multimedia.

[42]  Kalyan Sunkavalli,et al.  Fast Video Object Segmentation by Reference-Guided Mask Propagation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Ran He,et al.  PyramidBox++: High Performance Detector for Finding Tiny Face , 2019, ArXiv.

[45]  Ning Xu,et al.  YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark , 2018, ArXiv.

[46]  In So Kweon,et al.  Deep Video Inpainting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Euntai Kim,et al.  Kernelized Memory Network for Video Object Segmentation , 2020, ECCV.

[48]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Yuk Heo Interactive Video Object Segmentation Using Sparse-to-Dense Networks , 2019 .

[50]  Yang Liu,et al.  HAMBox: Delving Into Mining High-Quality Anchors on Face Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Xu Tang,et al.  PyramidBox: A Context-assisted Single Shot Face Detector , 2018, ECCV.