Exploiting Prediction Error Inconsistencies through LSTM-based Classifiers to Detect Deepfake Videos

The ability of artificial intelligence techniques to build synthesized brand new videos or to alter the facial expression of already existing ones has been efficiently demonstrated in the literature. The identification of such new threat generally known as Deepfake, but consisting of different techniques, is fundamental in multimedia forensics. In fact this kind of manipulated information could undermine and easily distort the public opinion on a certain person or about a specific event. Thus, in this paper, a new technique able to distinguish synthetic generated portrait videos from natural ones is introduced by exploiting inconsistencies due to the prediction error in the re-encoding phase. In particular, features based on inter-frame prediction error have been investigated jointly with a Long Short-Term Memory (LSTM) model network able to learn the temporal correlation among consecutive frames. Preliminary results have demonstrated that such sequence-based approach, used to distinguish between original and manipulated videos, highlights promising performances.

[1]  Shih-Fu Chang,et al.  Physics-motivated features for distinguishing photographic images and computer graphics , 2005, ACM Multimedia.

[2]  Weihong Wang,et al.  Exposing digital forgeries in video by detecting double MPEG compression , 2006, MM&Sec '06.

[3]  Giulia Boato,et al.  Physiologically-based detection of computer generated faces in video , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[4]  Edward J. Delp,et al.  Deepfake Video Detection Using Recurrent Neural Networks , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[5]  Premkumar Natarajan,et al.  Recurrent Convolutional Strategies for Face Manipulation Detection in Videos , 2019, CVPR Workshops.

[6]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Jiwu Huang,et al.  Discriminating Computer Graphics Images and Natural Images Using Hidden Markov Tree Model , 2010, IWDW.

[8]  Andrew Owens,et al.  CNN-Generated Images Are Surprisingly Easy to Spot… for Now , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Lucas Theis,et al.  Fast Face-Swap Using Convolutional Neural Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Patrick Pérez,et al.  Deep video portraits , 2018, ACM Trans. Graph..

[11]  Xin Yang,et al.  Exposing Deep Fakes Using Inconsistent Head Poses , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Richa Singh,et al.  Detecting Facial Retouching Using Supervised Deep Learning , 2016, IEEE Transactions on Information Forensics and Security.

[13]  Francesco G. B. De Natale,et al.  Discrimination between computer generated and natural human faces based on asymmetry information , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[14]  K. J. Ray Liu,et al.  Temporal Forensics and Anti-Forensics for Motion Compensated Video , 2012, IEEE Transactions on Information Forensics and Security.

[15]  Andreas Rössler,et al.  FaceForensics++: Learning to Detect Manipulated Facial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Junichi Yamagishi,et al.  Distinguishing computer graphics from natural images using convolution neural networks , 2017, 2017 IEEE Workshop on Information Forensics and Security (WIFS).

[17]  Belhassen Bayar,et al.  A Deep Learning Approach to Universal Image Manipulation Detection Using a New Convolutional Layer , 2016, IH&MMSec.

[18]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jessica J. Fridrich,et al.  Rich Models for Steganalysis of Digital Images , 2012, IEEE Transactions on Information Forensics and Security.

[20]  Christian Riess,et al.  Exploiting Visual Artifacts to Expose Deepfakes and Face Manipulations , 2019, 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW).

[21]  Justus Thies,et al.  Demo of Face2Face: real-time face capture and reenactment of RGB videos , 2016, SIGGRAPH Emerging Technologies.

[22]  Junichi Yamagishi,et al.  MesoNet: a Compact Facial Video Forgery Detection Network , 2018, 2018 IEEE International Workshop on Information Forensics and Security (WIFS).

[23]  Hao Li,et al.  Protecting World Leaders Against Deep Fakes , 2019, CVPR Workshops.

[24]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Cristiano Saltori,et al.  Incremental learning for the detection and classification of GAN-generated images , 2019, 2019 IEEE International Workshop on Information Forensics and Security (WIFS).

[26]  Alexei A. Efros,et al.  Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Chang-Tsun Li,et al.  Social Network Identification Through Image Classification With CNN , 2019, IEEE Access.

[28]  Davide Cozzolino,et al.  Recasting Residual-based Local Descriptors as Convolutional Neural Networks: an Application to Image Forgery Detection , 2017, IH&MMSec.

[29]  Vito Cappellini,et al.  Analysis of denoising filters for photo response non uniformity noise extraction in source camera identification , 2009, 2009 16th International Conference on Digital Signal Processing.

[30]  Siwei Lyu,et al.  How realistic is photorealistic? , 2005, IEEE Transactions on Signal Processing.

[31]  Andreas Rössler,et al.  FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces , 2018, ArXiv.

[32]  Jan P. Allebach,et al.  Forensic techniques for classifying scanner, computer generated and digital camera images , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Mo Chen,et al.  Determining Image Origin and Integrity Using Sensor Noise , 2008, IEEE Transactions on Information Forensics and Security.