Deep Video Inpainting Detection

This paper studies video inpainting detection, which localizes an inpainted region in a video both spatially and temporally. In particular, we introduce VIDNet, Video Inpainting Detection Network, which contains a two-stream encoder-decoder architecture with attention module. To reveal artifacts encoded in compression, VIDNet additionally takes in Error Level Analysis frames to augment RGB frames, producing multimodal features at different levels with an encoder. Exploring spatial and temporal relationships, these features are further decoded by a Convolutional LSTM to predict masks of inpainted regions. In addition, when detecting whether a pixel is inpainted or not, we present a quad-directional local attention module that borrows information from its surrounding pixels from four directions. Extensive experiments are conducted to validate our approach. We demonstrate, among other things, that VIDNet not only outperforms by clear margins alternative inpainting detection methods but also generalizes well on novel videos that are unseen during training.

[1]  In So Kweon,et al.  Deep Video Inpainting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Adam Finkelstein,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[3]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Narendra Ahuja,et al.  Temporally coherent completion of dynamic video , 2016, ACM Trans. Graph..

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Premkumar Natarajan,et al.  ManTra-Net: Manipulation Tracing Network for Detection and Localization of Image Forgeries With Anomalous Features , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Wei Xiong,et al.  Foreground-Aware Image Inpainting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[10]  Richard S. Zemel,et al.  End-to-End Instance Segmentation with Recurrent Attention , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Andrew Owens,et al.  Fighting Fake News: Image Splice Detection via Learned Self-Consistency , 2018, ECCV.

[12]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Winston H. Hsu,et al.  Free-Form Video Inpainting With 3D Gated Convolution and Temporal PatchGAN , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Andreas Rössler,et al.  ForensicTransfer: Weakly-supervised Domain Adaptation for Forgery Detection , 2018, ArXiv.

[16]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[17]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[18]  Ning Xu,et al.  An Internal Learning Approach to Video Inpainting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Seoung Wug Oh,et al.  Copy-and-Paste Networks for Deep Video Inpainting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Ming-Hsuan Yang,et al.  Learning Recursive Filters for Low-Level Vision via a Hybrid Neural Network , 2016, ECCV.

[22]  Xianfeng Zhao,et al.  A deep learning approach to patch-based image inpainting forensics , 2018, Signal Process. Image Commun..

[23]  Yiannis Kompatsiaris,et al.  Large-scale evaluation of splicing localization algorithms for web images , 2017, Multimedia Tools and Applications.

[24]  Alessandro Piva,et al.  Image Forgery Localization via Fine-Grained Analysis of CFA Artifacts , 2012, IEEE Transactions on Information Forensics and Security.

[25]  C.-C. Jay Kuo,et al.  Image Splicing Localization using a Multi-task Fully Convolutional Network (MFCN) , 2017, J. Vis. Commun. Image Represent..

[26]  Chuan Wang,et al.  Video Inpainting by Jointly Learning Temporal Structure and Spatial Details , 2018, AAAI.

[27]  Larry S. Davis,et al.  Learning Rich Features for Image Manipulation Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Vicent Caselles,et al.  Exemplar-Based Image Inpainting Using Multiscale Graph Cuts , 2013, IEEE Transactions on Image Processing.

[29]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[30]  Davide Cozzolino,et al.  Single-image splicing localization through autoencoder-based anomaly detection , 2016, 2016 IEEE International Workshop on Information Forensics and Security (WIFS).

[31]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[32]  Jing Dong,et al.  Tampered Region Localization of Digital Color Images Based on JPEG Compression Noise , 2010, IWDW.

[33]  Ling Shao,et al.  See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jiwu Huang,et al.  Localization of Deep Inpainting Using High-Pass Fully Convolutional Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Yiannis Kompatsiaris,et al.  Detecting image splicing in the wild (WEB) , 2015, 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[36]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[37]  Larry S. Davis,et al.  Generate, Segment, and Refine: Towards Generic Manipulation Segmentation , 2018, AAAI.

[38]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[39]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Jian Sun,et al.  Image Completion Approaches Using the Statistics of Similar Patches , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Bolei Zhou,et al.  Deep Flow-Guided Video Inpainting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Seoung Wug Oh,et al.  Onion-Peel Networks for Deep Video Completion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Qiong Wu,et al.  Detection of digital doctoring in exemplar-based inpainted images , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[44]  Babak Mahdian,et al.  Using noise inconsistencies for blind image forensics , 2009, Image Vis. Comput..