Detecting Deepfake Video by Learning Two-Level Features with Two-Stream Convolutional Neural Network

Deepfake techniques has made face swapping in video easy to use. Nowadays, the spread of Deepfake videos over networks is concerned worldwide. This work proposes an approach to more accurate and robust detection of them. Since artifacts left by Deepfake tools can be largely categorized into two classes of different levels, i.e. semantic and noise level, we adopt a two-stream convolutional neural network (CNN) to capture the 2-level features concurrently. Xception network is trained only as the first stream to detect semantic anomalies such as the editing artifacts around face contour, detail missing, and geometric inconsistence in eyes. Meanwhile, the 2nd stream, which contain the constrained convolution filter and median filter, is designed to capture the tampering traces in local noises. By concatenating the 2-level features learned from the both streams, our method obtains very comprehensive knowledge about the existence of face swapping. The experimental results have shown its advantage over the existing methods on both the accuracy and robustness.

[1]  K. J. Ray Liu,et al.  Robust Median Filtering Forensics Using an Autoregressive Model , 2013, IEEE Transactions on Information Forensics and Security.

[2]  Andreas Rössler,et al.  FaceForensics++: Learning to Detect Manipulated Facial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Xin Yang,et al.  Exposing Deep Fakes Using Inconsistent Head Poses , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Siwei Lyu,et al.  In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking , 2018, ArXiv.

[5]  Tal Hassner,et al.  On Face Segmentation, Face Swapping, and Face Perception , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[6]  Jessica J. Fridrich,et al.  Rich Models for Steganalysis of Digital Images , 2012, IEEE Transactions on Information Forensics and Security.

[7]  Scott McCloskey,et al.  Detecting GAN-generated Imagery using Color Cues , 2018, ArXiv.

[8]  Junichi Yamagishi,et al.  MesoNet: a Compact Facial Video Forgery Detection Network , 2018, 2018 IEEE International Workshop on Information Forensics and Security (WIFS).

[9]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[12]  Christian Riess,et al.  Exploiting Visual Artifacts to Expose Deepfakes and Face Manipulations , 2019, 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW).

[13]  Bin Li,et al.  Detection of Deep Network Generated Images Using Disparities in Color Components , 2018, ArXiv.

[14]  Larry S. Davis,et al.  Two-Stream Neural Networks for Tampered Face Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Edward J. Delp,et al.  Deepfake Video Detection Using Recurrent Neural Networks , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Belhassen Bayar,et al.  Constrained Convolutional Neural Networks: A New Approach Towards General Purpose Image Manipulation Detection , 2018, IEEE Transactions on Information Forensics and Security.

[18]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[19]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[20]  Siwei Lyu,et al.  Exposing DeepFake Videos By Detecting Face Warping Artifacts , 2018, CVPR Workshops.

[21]  Jean-Luc Dugelay,et al.  Face aging with conditional generative adversarial networks , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[22]  Larry S. Davis,et al.  Learning Rich Features for Image Manipulation Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.