Exploiting Human Social Cognition for the Detection of Fake and Fraudulent Faces via Memory Networks

Advances in computer vision have brought us to the point where we have the ability to synthesise realistic fake content. Such approaches are seen as a source of disinformation and mistrust, and pose serious concerns to governments around the world. Convolutional Neural Networks (CNNs) demonstrate encouraging results when detecting fake images that arise from the specific type of manipulation they are trained on. However, this success has not transitioned to unseen manipulation types, resulting in a significant gap in the line-of-defense. We propose a Hierarchical Memory Network (HMN) architecture, which is able to successfully detect faked faces by utilising knowledge stored in neural memories as well as visual cues to reason about the perceived face and anticipate its future semantic embeddings. This renders a generalisable face tampering detection framework. Experimental results demonstrate the proposed approach achieves superior performance for fake and fraudulent face detection compared to the state-of-the-art.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Siwei Lyu,et al.  In Ictu Oculi: Exposing AI Created Fake Videos by Detecting Eye Blinking , 2018, 2018 IEEE International Workshop on Information Forensics and Security (WIFS).

[3]  Hongyu Guo,et al.  Long Short-Term Memory Over Recursive Structures , 2015, ICML.

[4]  Sridha Sridharan,et al.  Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection , 2017, Neural Networks.

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  Patrick Pérez,et al.  VDub: Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track , 2015, Comput. Graph. Forum.

[7]  Junichi Yamagishi,et al.  Capsule-forensics: Using Capsule Networks to Detect Forged Images and Videos , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  David Amos,et al.  Generative Temporal Models with Memory , 2017, ArXiv.

[10]  Sridha Sridharan,et al.  Task Specific Visual Saliency Prediction with Memory Augmented Conditional Generative Adversarial Networks , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[11]  Kiran B. Raja,et al.  Fake Face Detection Methods: Can They Be Generalized? , 2018, 2018 International Conference of the Biometrics Special Interest Group (BIOSIG).

[12]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[13]  Davide Maltoni,et al.  The magic passport , 2014, IEEE International Joint Conference on Biometrics.

[14]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[15]  Wojciech Matusik,et al.  Video face replacement , 2011, ACM Trans. Graph..

[16]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[17]  G. Yovel,et al.  TMS Evidence for the Involvement of the Right Occipital Face Area in Early Face Processing , 2007, Current Biology.

[18]  Shelley E. Taylor,et al.  Social Cognition, from Brains to Culture , 1984 .

[19]  Anna Weinberg,et al.  Mind Perception: Real but Not Artificial Faces Sustain Neural Activity beyond the N170/VPP , 2011, PloS one.

[20]  Sridha Sridharan,et al.  Tree Memory Networks for Modelling Long-term Temporal Dependencies , 2017, Neurocomputing.

[21]  Daniel Cohen-Or,et al.  Bringing portraits to life , 2017, ACM Trans. Graph..

[22]  Larry S. Davis,et al.  Two-Stream Neural Networks for Tampered Face Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Ira Kemelmacher-Shlizerman,et al.  Synthesizing Obama , 2017, ACM Trans. Graph..

[24]  Mario Fritz,et al.  A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.

[25]  Davide Cozzolino,et al.  Recasting Residual-based Local Descriptors as Convolutional Neural Networks: an Application to Image Forgery Detection , 2017, IH&MMSec.

[26]  Qi Zhao,et al.  Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[28]  Junichi Yamagishi,et al.  MesoNet: a Compact Facial Video Forgery Detection Network , 2018, 2018 IEEE International Workshop on Information Forensics and Security (WIFS).

[29]  M. Botsch,et al.  Differential effects of face-realism and emotion on event-related brain potentials and their implications for the uncanny valley theory , 2017, Scientific Reports.

[30]  Sankar K. Pal,et al.  Multilayer perceptron, fuzzy sets, and classification , 1992, IEEE Trans. Neural Networks.

[31]  Patrick Pérez,et al.  Automatic Face Reenactment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[33]  Junichi Yamagishi,et al.  Distinguishing computer graphics from natural images using convolution neural networks , 2017, 2017 IEEE Workshop on Information Forensics and Security (WIFS).

[34]  Jing Dong,et al.  Position Determines Perspective: Investigating Perspective Distortion for Image Forensics of Faces , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Justus Thies,et al.  Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.

[36]  Belhassen Bayar,et al.  A Deep Learning Approach to Universal Image Manipulation Detection Using a New Convolutional Layer , 2016, IH&MMSec.

[37]  Dong-Ming Yan,et al.  Distinguishing Between Natural and Computer-Generated Images Using Convolutional Neural Networks , 2018, IEEE Transactions on Information Forensics and Security.

[38]  Ruslan Salakhutdinov,et al.  Neural Map: Structured Memory for Deep Reinforcement Learning , 2017, ICLR.

[39]  Jing Dong,et al.  Optimized 3D Lighting Environment Estimation for Image Forgery Detection , 2017, IEEE Transactions on Information Forensics and Security.

[40]  Guillaume Lample,et al.  Fader Networks: Manipulating Images by Sliding Attributes , 2017, NIPS.

[41]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[42]  Arun Ross,et al.  An introduction to biometric recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[43]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[44]  Hong Yu,et al.  Neural Semantic Encoders , 2016, EACL.

[45]  Patrick Pérez,et al.  Deep video portraits , 2018, ACM Trans. Graph..

[46]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[48]  Junichi Yamagishi,et al.  Multi-task Learning for Detecting and Segmenting Manipulated Facial Images and Videos , 2019, 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[49]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  N. Kanwisher,et al.  The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception , 1997, The Journal of Neuroscience.

[51]  Andreas Rössler,et al.  FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces , 2018, ArXiv.

[52]  Robert M. Chesney,et al.  Deep Fakes: A Looming Challenge for Privacy, Democracy, and National Security , 2018 .

[53]  Sridha Sridharan,et al.  Pedestrian Trajectory Prediction with Structured Memory Hierarchies , 2018, ECML/PKDD.

[54]  Andreas Rössler,et al.  FaceForensics++: Learning to Detect Manipulated Facial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[57]  T. Allison,et al.  Social perception from visual cues: role of the STS region , 2000, Trends in Cognitive Sciences.

[58]  Junichi Yamagishi,et al.  Modular Convolutional Neural Network for Discriminating between Computer-Generated Images and Photographic Images , 2018, ARES.

[59]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[60]  Premkumar Natarajan,et al.  Recurrent Convolutional Strategies for Face Manipulation Detection in Videos , 2019, CVPR Workshops.

[61]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[64]  Andreas Rössler,et al.  ForensicTransfer: Weakly-supervised Domain Adaptation for Forgery Detection , 2018, ArXiv.

[65]  Silvio Savarese,et al.  SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Robert Pless,et al.  Deep Feature Interpolation for Image Content Changes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[68]  Sridha Sridharan,et al.  Learning Temporal Strategic Relationships using Generative Adversarial Imitation Learning , 2018, AAMAS.

[69]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Richard Socher,et al.  Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.

[71]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.