Use of a Capsule Network to Detect Fake Images and Videos

The revolution in computer hardware, especially in graphics processing units and tensor processing units, has enabled significant advances in computer graphics and artificial intelligence algorithms. In addition to their many beneficial applications in daily life and business, computer-generated/manipulated images and videos can be used for malicious purposes that violate security systems, privacy, and social trust. The deepfake phenomenon and its variations enable a normal user to use his or her personal computer to easily create fake videos of anybody from a short real online video. Several countermeasures have been introduced to deal with attacks using such videos. However, most of them are targeted at certain domains and are ineffective when applied to other domains or new attacks. In this paper, we introduce a capsule network that can detect various kinds of attacks, from presentation attacks using printed images and replayed videos to attacks using fake videos created using deep learning. It uses many fewer parameters than traditional convolutional neural networks with similar performance. Moreover, we explain, for the first time ever in the literature, the theory behind the application of capsule networks to the forensics problem through detailed analysis and visualization.

[1]  Wan-Chun Ma,et al.  The Digital Emily Project: Achieving a Photorealistic Digital Actor , 2010, IEEE Computer Graphics and Applications.

[2]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[3]  Wojciech Matusik,et al.  Video face replacement , 2011, ACM Trans. Graph..

[4]  Jessica J. Fridrich,et al.  Rich Models for Steganalysis of Digital Images , 2012, IEEE Transactions on Information Forensics and Security.

[5]  Sébastien Marcel,et al.  On the effectiveness of local binary patterns in face anti-spoofing , 2012, 2012 BIOSIG - Proceedings of the International Conference of Biometrics Special Interest Group (BIOSIG).

[6]  Sébastien Marcel,et al.  Can face anti-spoofing countermeasures work in a real world scenario? , 2013, 2013 International Conference on Biometrics (ICB).

[7]  Stan Z. Li,et al.  Learn Convolutional Neural Network for Face Anti-Spoofing , 2014, ArXiv.

[8]  Giulia Boato,et al.  RAISE: a raw images dataset for digital image forensics , 2015, MMSys.

[9]  Patrick Pérez,et al.  VDub: Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track , 2015, Comput. Graph. Forum.

[10]  Trevor Darrell,et al.  Fully convolutional networks for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Wonjun Kim,et al.  Face liveness detection from a single image via diffusion speed model. , 2015, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[15]  Belhassen Bayar,et al.  A Deep Learning Approach to Universal Image Manipulation Detection Using a New Convolutional Layer , 2016, IH&MMSec.

[16]  Ramprasaath R. Selvaraju,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  A. Lakshmi,et al.  DEEP REPRESENTATIONS FOR IRIS , FACE , AND FINGERPRINT SPOOFING DETECTION , 2017 .

[18]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Larry S. Davis,et al.  Two-Stream Neural Networks for Tampered Face Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Kiran B. Raja,et al.  Transferable Deep-CNN Features for Detecting Digital and Print-Scanned Morphed Face Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Ausif Mahmood,et al.  Deep face liveness detection based on nonlinear diffusion using convolution neural network , 2016, Signal, Image and Video Processing.

[22]  Yang Jin,et al.  Capsule Network Performance on Complex Data , 2017, ArXiv.

[23]  Ira Kemelmacher-Shlizerman,et al.  Synthesizing Obama , 2017, ACM Trans. Graph..

[24]  Davide Cozzolino,et al.  Recasting Residual-based Local Descriptors as Convolutional Neural Networks: an Application to Image Forgery Detection , 2017, IH&MMSec.

[25]  B. S. Manjunath,et al.  Exploiting Spatial Structure for Localizing Manipulated Image Regions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Daniel Cohen-Or,et al.  Bringing portraits to life , 2017, ACM Trans. Graph..

[27]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Justus Thies,et al.  Demo of FaceVR: real-time facial reenactment and eye gaze control in virtual reality , 2016, SIGGRAPH Emerging Technologies.

[29]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[30]  Junichi Yamagishi,et al.  Distinguishing computer graphics from natural images using convolution neural networks , 2017, 2017 IEEE Workshop on Information Forensics and Security (WIFS).

[31]  Koichi Ito,et al.  Recent advances in biometrie security: A case study of liveness detection in face recognition , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[32]  Davide Cozzolino,et al.  Detection of GAN-Generated Fake Images over Social Networks , 2018, 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[33]  Sébastien Marcel,et al.  Speaker Inconsistency Detection in Tampered Video , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[34]  Junichi Yamagishi,et al.  MesoNet: a Compact Facial Video Forgery Detection Network , 2018, 2018 IEEE International Workshop on Information Forensics and Security (WIFS).

[35]  Siwei Lyu,et al.  In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking , 2018, ArXiv.

[36]  ForensicTransfer: Weakly-supervised Domain Adaptation for Forgery Detection , 2018, ArXiv.

[37]  Geoffrey E. Hinton,et al.  Matrix capsules with EM routing , 2018, ICLR.

[38]  Premkumar Natarajan,et al.  CapsuleGAN: Generative Adversarial Capsule Network , 2018, ECCV Workshops.

[39]  Min Yang,et al.  Investigating Capsule Networks with Dynamic Routing for Text Classification , 2018, EMNLP.

[40]  Mohammad Taha Bahadori,et al.  Spectral Capsule Networks , 2018 .

[41]  Chen Xu,et al.  MS-CapsNet: A Novel Multi-Scale Capsule Network , 2018, IEEE Signal Processing Letters.

[42]  Junichi Yamagishi,et al.  Modular Convolutional Neural Network for Discriminating between Computer-Generated Images and Photographic Images , 2018, ARES.

[43]  Larry S. Davis,et al.  Learning Rich Features for Image Manipulation Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Andreas Rössler,et al.  FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces , 2018, ArXiv.

[45]  Robertas Alzbutas,et al.  Convolutional capsule network for classification of breast cancer histology images , 2018, ICIAR.

[46]  Patrick Pérez,et al.  Deep video portraits , 2018, ACM Trans. Graph..

[47]  Dong-Ming Yan,et al.  Distinguishing Between Natural and Computer-Generated Images Using Convolutional Neural Networks , 2018, IEEE Transactions on Information Forensics and Security.

[48]  Justus Thies,et al.  Deferred neural rendering , 2019, ACM Trans. Graph..

[49]  Josef Kittler,et al.  Combining Multiple one-class Classifiers for Anomaly based Face Spoofing Attack Detection , 2019, 2019 International Conference on Biometrics (ICB).

[50]  Anjith George,et al.  Deep Pixel-wise Binary Supervision for Face Presentation Attack Detection , 2019, 2019 International Conference on Biometrics (ICB).

[51]  Roberto Javier López-Sastre,et al.  Generalized Presentation Attack Detection: a face anti-spoofing evaluation proposal , 2019, 2019 International Conference on Biometrics (ICB).

[52]  Junichi Yamagishi,et al.  Capsule-forensics: Using Capsule Networks to Detect Forged Images and Videos , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[53]  Premkumar Natarajan,et al.  Recurrent Convolutional Strategies for Face Manipulation Detection in Videos , 2019, CVPR Workshops.

[54]  Sébastien Marcel,et al.  Domain Adaptation in Multi-Channel Autoencoder based Features for Robust Face Anti-Spoofing , 2019, 2019 International Conference on Biometrics (ICB).

[55]  Adam Finkelstein,et al.  Text-based editing of talking-head video , 2019, ACM Trans. Graph..

[56]  Andrew Owens,et al.  Detecting Photoshopped Faces by Scripting Photoshop , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Amit K. Roy-Chowdhury,et al.  Hybrid LSTM and Encoder–Decoder Architecture for Detection of Image Forgeries , 2019, IEEE Transactions on Image Processing.

[58]  Hao Li,et al.  Protecting World Leaders Against Deep Fakes , 2019, CVPR Workshops.

[59]  Maja Pantic,et al.  End-to-End Speech-Driven Realistic Facial Animation with Temporal GANs , 2019, CVPR Workshops.

[60]  Xilin Chen,et al.  Improving Cross-database Face Presentation Attack Detection via Adversarial Domain Adaptation , 2019, 2019 International Conference on Biometrics (ICB).

[61]  Andreas Rössler,et al.  FaceForensics++: Learning to Detect Manipulated Facial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[62]  Mayank Vatsa,et al.  Crafting A Panoptic Face Presentation Attack Detector , 2019, 2019 International Conference on Biometrics (ICB).

[63]  Iacopo Masi,et al.  RoPAD: Robust Presentation Attack Detection through Unsupervised Adversarial Invariance , 2019, 2019 International Conference on Biometrics (ICB).

[64]  Justus Thies,et al.  Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.

[65]  Luisa Verdoliva,et al.  Do GANs Leave Artificial Fingerprints? , 2018, 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[66]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Sébastien Marcel,et al.  Vulnerability assessment and detection of Deepfake videos , 2019, 2019 International Conference on Biometrics (ICB).

[68]  Junichi Yamagishi,et al.  Multi-task Learning for Detecting and Segmenting Manipulated Facial Images and Videos , 2019, 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[69]  Abdenour Hadid,et al.  Face Anti-spoofing using Hybrid Residual Learning Framework , 2019, 2019 International Conference on Biometrics (ICB).

[70]  V. Lempitsky,et al.  Few-Shot Adversarial Learning of Realistic Neural Talking Head Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[71]  Tal Hassner,et al.  FSGAN: Subject Agnostic Face Swapping and Reenactment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[72]  Esa Rahtu,et al.  ICface: Interpretable and Controllable Face Reenactment Using GANs , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).