Deepfake Video Detection Using Recurrent Neural Networks

In recent months a machine learning based free software tool has made it easy to create believable face swaps in videos that leaves few traces of manipulation, in what are known as "deepfake" videos. Scenarios where these realistic fake videos are used to create political distress, blackmail someone or fake terrorism events are easily envisioned. This paper proposes a temporal-aware pipeline to automatically detect deepfake videos. Our system uses a convolutional neural network (CNN) to extract frame-level features. These features are then used to train a recurrent neural network (RNN) that learns to classify if a video has been subject to manipulation or not. We evaluate our method against a large set of deepfake videos collected from multiple video websites. We show how our system can achieve competitive results in this task while using a simple architecture.

[1]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[3]  Jiri Matas,et al.  Recurrent Color Constancy , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  H. Farid Photo Forensics , 2016 .

[5]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Paolo Bestagini,et al.  Reliability Map Estimation for CNN-Based Camera Model Attribution , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7]  Daniel Cohen-Or,et al.  Bringing portraits to life , 2017, ACM Trans. Graph..

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Jean-Luc Dugelay,et al.  Face aging with conditional generative adversarial networks , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[10]  Kiran B. Raja,et al.  Transferable Deep-CNN Features for Detecting Digital and Print-Scanned Morphed Face Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Giulia Boato,et al.  Physiologically-based detection of computer generated faces in video , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[13]  Weihong Wang,et al.  Exposing Digital Forgeries in Interlaced and Deinterlaced Video , 2007, IEEE Transactions on Information Forensics and Security.

[14]  Andreas Rössler,et al.  FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces , 2018, ArXiv.

[15]  Chi-Keung Tang,et al.  Conditional CycleGAN for Attribute Guided Face Image Generation , 2017, ArXiv.

[16]  Stefan Lorant Lincoln, a picture story of his life , 1952 .

[17]  Yiyi Liao,et al.  Graph Regularized Auto-Encoders for Image Representation , 2017, IEEE Transactions on Image Processing.

[18]  Wojciech Matusik,et al.  Video face replacement , 2011, ACM Trans. Graph..

[19]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Zhenan Sun,et al.  Recent Progress of Face Image Synthesis , 2017, 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR).

[21]  Larry S. Davis,et al.  Two-Stream Neural Networks for Tampered Face Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Junichi Yamagishi,et al.  Distinguishing computer graphics from natural images using convolution neural networks , 2017, 2017 IEEE Workshop on Information Forensics and Security (WIFS).

[24]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[25]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[26]  Hyrum S. Anderson,et al.  The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation , 2018, ArXiv.

[27]  Justus Thies,et al.  Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.

[28]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[29]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Paolo Bestagini,et al.  A Counter-Forensic Method for CNN-Based Camera Model Identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[32]  Jesús Martínez del Rincón,et al.  Recurrent Convolutional Network for Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Robert Pless,et al.  Deep Feature Interpolation for Image Content Changes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Paolo Bestagini,et al.  Local tampering detection in video sequences , 2013, 2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP).

[35]  Guillaume Lample,et al.  Fader Networks: Manipulating Images by Sliding Attributes , 2017, NIPS.