Masked Relation Learning for DeepFake Detection

DeepFake detection aims to differentiate falsified faces from real ones. Most approaches formulate it as a binary classification problem by solely mining the local artifacts and inconsistencies of face forgery, which neglect the relation across local regions. Although several recent works explore local relation learning for DeepFake detection, they overlook the propagation of relational information and lead to limited performance gains. To address these issues, this paper provides a new perspective by formulating DeepFake detection as a graph classification problem, in which each facial region corresponds to a vertex. But relational information with large redundancy hinders the expressiveness of graphs. Inspired by the success of masked modeling, we propose Masked Relation Learning which decreases the redundancy to learn informative relational features. Specifically, a spatiotemporal attention module is exploited to learn the attention features of multiple facial regions. A relation learning module masks partial correlations between regions to reduce redundancy and then propagates the relational information across regions to capture the irregularity from a global view of the graph. We empirically discover that a moderate masking rate (e.g., 50%) brings the best performance gain. Experiments verify the effectiveness of Masked Relation Learning and demonstrate that our approach outperforms the state of the art by 2% AUC on the cross-dataset DeepFake video detection. Code will be available at https://github.com/zimyang/MaskRelation.

[1]  Danqi Chen,et al.  Should You Mask 15% in Masked Language Modeling? , 2022, EACL.

[2]  Nenghai Yu,et al.  UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision Transformer for Face Forgery Detection , 2022, ECCV.

[3]  Errui Ding,et al.  Delving into Sequential Patches for Deepfake Detection , 2022, NeurIPS.

[4]  Jilin Li,et al.  Delving into the Local: Dynamic Inconsistency Learning for DeepFake Video Detection , 2022, AAAI.

[5]  Nenghai Yu,et al.  ADT: Anti-Deepfake Transformer , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Hongxia Yang,et al.  GraphMAE: Self-Supervised Masked Graph Autoencoders , 2022, KDD.

[7]  Haoqi Fan,et al.  Masked Autoencoders As Spatiotemporal Learners , 2022, NeurIPS.

[8]  Fei Wu,et al.  Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yu Cheng,et al.  The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Fang Wen,et al.  Protecting Celebrities from DeepFake with Identity Consistency Transformer , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Junzhou Huang,et al.  Structure-aware conditional variational auto-encoder for constrained molecule optimization , 2022, Pattern Recognit..

[12]  Ran Yi,et al.  Exploiting Fine-grained Face Forgery Clues via Progressive Enhancement Learning , 2021, AAAI.

[13]  Simon S. Woo,et al.  ADD: Frequency Attention and Multi-View based Knowledge Distillation to Detect Low-Quality Compressed Deepfake Images , 2021, AAAI.

[14]  Hakan Bilen,et al.  Not All Relations are Equal: Mining Informative Labels for Scene Graph Generation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xiaohui Cui,et al.  MC-LCR: Multi-modal contrastive classification by locally correlated representations for effective face forgery detection , 2021, Knowl. Based Syst..

[17]  Felix Juefei-Xu,et al.  Countering Malicious DeepFakes: Survey, Battleground, and Horizon , 2021, International Journal of Computer Vision.

[18]  Lizhuang Ma,et al.  Hierarchical Contrastive Inconsistency Learning for Deepfake Video Detection , 2022, ECCV.

[19]  R. Ji,et al.  An Information Theoretic Approach for Attention-Driven Face Forgery Detection , 2022, ECCV.

[20]  Lizhuang Ma,et al.  Spatiotemporal Inconsistency Learning for DeepFake Video Detection , 2021, ACM Multimedia.

[21]  Jianmin Bao,et al.  Exploring Temporal Coherence for More General Video Face Forgery Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Junichi Yamagishi,et al.  OpenForensics: Large-Scale Challenging Dataset For Multi-Face Forgery Detection And Segmentation In-The-Wild , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Andreas Uhl,et al.  PRNU-based Deepfake Detection , 2021, IH&MMSec.

[24]  Rongrong Ji,et al.  Local Relation Learning for Face Forgery Detection , 2021, AAAI.

[25]  Na Ruan,et al.  Improving the Efficiency and Robustness of Deepfakes Detection through Precise Geometric Features , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Fahad Shahbaz Khan,et al.  Orthogonal Projection Loss , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Yongdong Zhang,et al.  PRRNet: Pixel-Region relation network for face forgery detection , 2021, Pattern Recognition.

[28]  Yongdong Zhang,et al.  Frequency-aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Nenghai Yu,et al.  Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yuanjun Xiong,et al.  Learning Self-Consistency for Deepfake Detection , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Maja Pantic,et al.  Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Richard Bonneau,et al.  Masked graph modeling for molecule generation , 2020, Nature Communications.

[33]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[34]  Alessandro Rozza,et al.  Graph-Based Neural Network Models with Multiple Self-Supervised Auxiliary Tasks , 2020, Pattern Recognit. Lett..

[35]  Ig-Jae Kim,et al.  Relational Deep Feature Learning for Heterogeneous Face Recognition , 2020, IEEE Transactions on Information Forensics and Security.

[36]  Jiangqun Ni,et al.  Multi-semantic CRF-based attention model for image forgery detection and localization , 2021, Signal Process..

[37]  Yu-Gang Jiang,et al.  WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection , 2020, ACM Multimedia.

[38]  Yu Liu,et al.  T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction , 2018, IEEE Transactions on Intelligent Transportation Systems.

[39]  Yuan He,et al.  Sharp Multiple Instance Learning for DeepFake Video Detection , 2020, ACM Multimedia.

[40]  Iacopo Masi,et al.  Two-branch Recurrent Network for Isolating Deepfakes in Videos , 2020, ECCV.

[41]  Lu Sheng,et al.  Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues , 2020, ECCV.

[42]  Jie Zhou,et al.  Adaptive Graph Encoder for Attributed Graph Embedding , 2020, KDD.

[43]  Suhang Wang,et al.  Self-supervised Learning on Graphs: Deep Insights and New Direction , 2020, ArXiv.

[44]  Dong Chen,et al.  Advancing High Fidelity Identity Swapping for Forgery Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Ramanathan Subramanian,et al.  Not made for each other- Audio-Visual Dissonance-based Deepfake Detection and Localization , 2020, ACM Multimedia.

[46]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[47]  Qiong Liu,et al.  MV-GNN: Multi-View Graph Neural Network for Compression Artifacts Reduction , 2020, IEEE Transactions on Image Processing.

[48]  Yi-Zhe Song,et al.  Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches , 2020, ECCV.

[49]  A. Morales,et al.  DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection , 2020, Inf. Fusion.

[50]  Fang Wen,et al.  Face X-Ray for More General Face Forgery Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Bumsub Ham,et al.  Relation Network for Person Re-identification , 2019, AAAI.

[52]  Siwei Lyu,et al.  Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Hasan Davulcu,et al.  Graph Attention Auto-Encoders , 2019, 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI).

[54]  Cristian Canton-Ferrer,et al.  The Deepfake Detection Challenge (DFDC) Preview Dataset , 2019, ArXiv.

[55]  Jinkyu Lee,et al.  Orthogonality Constrained Multi-Head Attention for Keyword Spotting , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[56]  Sumit Kumar Jha,et al.  Predicting Heart Rate Variations of Deepfake Videos using Neural ODE , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[57]  Bong-Nam Kang,et al.  Attentional Feature-Pair Relation Networks for Accurate Face Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[58]  Tal Hassner,et al.  FSGAN: Subject Agnostic Face Swapping and Reenactment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[59]  Liang Yang,et al.  Masked Graph Convolutional Network , 2019, IJCAI.

[60]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[61]  Premkumar Natarajan,et al.  Recurrent Convolutional Strategies for Face Manipulation Detection in Videos , 2019, CVPR Workshops.

[62]  Justus Thies,et al.  Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[63]  Stephen Lin,et al.  Local Relation Networks for Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[64]  Andreas Rössler,et al.  FaceForensics++: Learning to Detect Manipulated Facial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[65]  Hao Li,et al.  Protecting World Leaders Against Deep Fakes , 2019, CVPR Workshops.

[66]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[67]  Siwei Lyu,et al.  In Ictu Oculi: Exposing AI Created Fake Videos by Detecting Eye Blinking , 2018, 2018 IEEE International Workshop on Information Forensics and Security (WIFS).

[68]  Junichi Yamagishi,et al.  MesoNet: a Compact Facial Video Forgery Detection Network , 2018, 2018 IEEE International Workshop on Information Forensics and Security (WIFS).

[69]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[70]  Justus Thies,et al.  Headon , 2018, ACM Trans. Graph..

[71]  Michael G. Rabbat,et al.  A Graph-CNN for 3D Point Cloud Classification , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[72]  Yann LeCun,et al.  A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[73]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[74]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Davide Cozzolino,et al.  Recasting Residual-based Local Descriptors as Convolutional Neural Networks: an Application to Image Forgery Detection , 2017, IH&MMSec.

[77]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[78]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[79]  Justus Thies,et al.  Face2Face: Real-Time Face Capture and Reenactment of RGB Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Belhassen Bayar,et al.  A Deep Learning Approach to Universal Image Manipulation Detection Using a New Convolutional Layer , 2016, IH&MMSec.

[81]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[82]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[84]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[85]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[86]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[87]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[88]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[89]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[90]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[91]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[92]  Francis R. Bach,et al.  Graph kernels between point clouds , 2007, ICML '08.