Exploring Temporal Coherence for More General Video Face Forgery Detection

Although current face manipulation techniques achieve impressive performance regarding quality and controllability, they are struggling to generate temporal coherent face videos. In this work, we explore to take full advantage of the temporal coherence for video face forgery detection. To achieve this, we propose a novel end-to-end framework, which consists of two major stages. The first stage is a fully temporal convolution network (FTCN). The key insight of FTCN is to reduce the spatial convolution kernel size to 1, while maintaining the temporal convolution kernel size un-changed. We surprisingly find this special design can benefit the model for extracting the temporal features as well as improve the generalization capability. The second stage is a Temporal Transformer network, which aims to explore the long-term temporal coherence. The proposed frame-work is general and flexible, which can be directly trained from scratch without any pre-training models or external datasets. Extensive experiments show that our framework outperforms existing methods and remains effective when applied to detect new sorts of face forgery videos.

[1]  Iacopo Masi,et al.  Two-branch Recurrent Network for Isolating Deepfakes in Videos , 2020, ECCV.

[2]  Dinesh Manocha,et al.  Emotions Don't Lie: An Audio-Visual Deepfake Detection Method using Affective Cues , 2020, ACM Multimedia.

[3]  Alexei A. Efros,et al.  CNN-Generated Images Are Surprisingly Easy to Spot… for Now , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yutaka Satoh,et al.  Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[5]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[6]  Dorothea Kolossa,et al.  Leveraging Frequency Analysis for Deep Fake Image Recognition , 2020, ICML.

[7]  David Bau,et al.  What makes fake images detectable? Understanding properties that generalize , 2020, ECCV.

[8]  Chen Change Loy,et al.  DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Xu Zhang,et al.  Detecting and Simulating Artifacts in GAN Fake Images , 2019, 2019 IEEE International Workshop on Information Forensics and Security (WIFS).

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Chia-Yen Lee,et al.  Learning to Detect Fake Face Images in the Wild , 2018, 2018 International Symposium on Computer, Consumer and Control (IS3C).

[12]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Maja Pantic,et al.  Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Brian Dolhansky,et al.  The DeepFake Detection Challenge Dataset , 2020, ArXiv.

[16]  Siwei Lyu,et al.  Exposing DeepFake Videos By Detecting Face Warping Artifacts , 2018, CVPR Workshops.

[17]  C.-C. Jay Kuo,et al.  Image Splicing Localization using a Multi-task Fully Convolutional Network (MFCN) , 2017, J. Vis. Commun. Image Represent..

[18]  Shiva K. Pentyala,et al.  Towards Generalizable Forgery Detection with Locality-aware AutoEncoder , 2019, ArXiv.

[19]  Junichi Yamagishi,et al.  MesoNet: a Compact Facial Video Forgery Detection Network , 2018, 2018 IEEE International Workshop on Information Forensics and Security (WIFS).

[20]  Siwei Lyu,et al.  In Ictu Oculi: Exposing AI Created Fake Videos by Detecting Eye Blinking , 2018, 2018 IEEE International Workshop on Information Forensics and Security (WIFS).

[21]  Shigeo Morishima,et al.  RSGAN: face swapping and editing using face and hair representation in latent spaces , 2018, SIGGRAPH Posters.

[22]  Margret Keuper,et al.  Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Premkumar Natarajan,et al.  Recurrent Convolutional Strategies for Face Manipulation Detection in Videos , 2019, CVPR Workshops.

[24]  Dong Chen,et al.  Advancing High Fidelity Identity Swapping for Forgery Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Lu Sheng,et al.  Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues , 2020, ECCV.

[26]  Sean Franklin,et al.  Deepfake Detection using Spatiotemporal Convolutional Networks , 2020, ArXiv.

[27]  Anil K. Jain,et al.  On the Detection of Digital Face Manipulation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Roberto Caldelli,et al.  Exploiting Prediction Error Inconsistencies through LSTM-based Classifiers to Detect Deepfake Videos , 2020, IH&MMSec.

[29]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Larry S. Davis,et al.  Two-Stream Neural Networks for Tampered Face Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  Yichen Wei,et al.  Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Bin Li,et al.  Detection of Deep Network Generated Images Using Disparities in Color Components , 2018, ArXiv.

[33]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Yann LeCun,et al.  A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Hui Ding,et al.  Learning to Recognize Patch-Wise Consistency for Deepfake Detection , 2020, ArXiv.

[36]  Amit K. Roy-Chowdhury,et al.  Hybrid LSTM and Encoder–Decoder Architecture for Detection of Image Forgeries , 2019, IEEE Transactions on Image Processing.

[37]  Tal Hassner,et al.  DeepFake Detection Based on Discrepancies Between Faces and Their Context , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Hao Li,et al.  Protecting World Leaders Against Deep Fakes , 2019, CVPR Workshops.

[39]  Maja Pantic,et al.  End-to-End Speech-Driven Realistic Facial Animation with Temporal GANs , 2019, CVPR Workshops.

[40]  Larry S. Davis,et al.  Learning Rich Features for Image Manipulation Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Ashraful Islam,et al.  DOA-GAN: Dual-Order Attentive Generative Adversarial Network for Image Copy-Move Forgery Detection and Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Andreas Rössler,et al.  FaceForensics++: Learning to Detect Manipulated Facial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Belhassen Bayar,et al.  A Deep Learning Approach to Universal Image Manipulation Detection Using a New Convolutional Layer , 2016, IH&MMSec.

[45]  Justus Thies,et al.  Neural Voice Puppetry: Audio-driven Facial Reenactment , 2019, ECCV.

[46]  Justus Thies,et al.  Face2Face: Real-Time Face Capture and Reenactment of RGB Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Justus Thies,et al.  Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.

[48]  Gang Hua,et al.  Towards Open-Set Identity Preserving Face Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Sezer Karaoglu,et al.  Spatio-temporal Features for Generalized Detection of Deepfake Videos , 2020, ArXiv.

[50]  Justus Thies,et al.  Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[51]  Alberto Del Bimbo,et al.  Deepfake Video Detection through Optical Flow Based CNN , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[52]  Junichi Yamagishi,et al.  Multi-task Learning for Detecting and Segmenting Manipulated Facial Images and Videos , 2019, 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[53]  Fang Wen,et al.  Face X-Ray for More General Face Forgery Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Jing Dong,et al.  On the generalization of GAN image forensics , 2019, CCBR.

[55]  Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Tal Hassner,et al.  FSGAN: Subject Agnostic Face Swapping and Reenactment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Maja Pantic,et al.  Realistic Speech-Driven Facial Animation with GANs , 2019, International Journal of Computer Vision.

[58]  Premkumar Natarajan,et al.  ManTra-Net: Manipulation Tracing Network for Detection and Localization of Image Forgeries With Anomalous Features , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Lucas Theis,et al.  Fast Face-Swap Using Convolutional Neural Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).