Towards Understanding Perceptual Differences between Genuine and Face-Swapped Videos

In this paper, we report on perceptual experiments indicating that there are distinct and quantitatively measurable differences in the way we visually perceive genuine versus face-swapped videos. Recent progress in deep learning has made face-swapping techniques a powerful tool for creative purposes, but also a means for unethical forgeries. Currently, it remains unclear why people are misled, and which indicators they use to recognize potential manipulations. Here, we conduct three perceptual experiments focusing on a wide range of aspects: the conspicuousness of artifacts, the viewing behavior using eye tracking, the recognition accuracy for different video lengths, and the assessment of emotions. Our experiments show that responses differ distinctly when watching manipulated as opposed to original faces, from which we derive perceptual cues to recognize face swaps. By investigating physiologically measurable signals, our findings yield valuable insights that may also be useful for advanced algorithmic detection.

[1]  H. Bülthoff,et al.  The MPI Facial Expression Database — A Validated Database of Emotional and Conversational Facial Expressions , 2012, PloS one.

[2]  K. Munhall,et al.  Spatial statistics of gaze fixations during dynamic face processing , 2007, Social neuroscience.

[3]  Andreas Rössler,et al.  FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces , 2018, ArXiv.

[4]  P. Ekman,et al.  Pan-Cultural Elements in Facial Displays of Emotion , 1969, Science.

[5]  Corinna E. Löckenhoff,et al.  Age differences in recognition of emotion in lexical stimuli and facial expressions. , 2007, Psychology and aging.

[6]  Mario Fritz,et al.  It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[7]  Andrew Owens,et al.  CNN-Generated Images Are Surprisingly Easy to Spot… for Now , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Hao Li,et al.  Protecting World Leaders Against Deep Fakes , 2019, CVPR Workshops.

[9]  L F Dell'Osso,et al.  Eyes as the Center of Focus in the Visual Examination of Human Faces , 1978, Perceptual and motor skills.

[10]  Chen Change Loy,et al.  DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Leslie Wöhler,et al.  PEFS: A Validated Dataset for Perceptual Experiments on Face Swap Portrait Videos , 2020, Communications in Computer and Information Science.

[12]  P. Hills,et al.  Eye-tracking the own-gender bias in face recognition: Other-gender faces are viewed differently to own-gender faces , 2016 .

[13]  Lucas Theis,et al.  Fast Face-Swap Using Convolutional Neural Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Haibin Ling,et al.  Revisiting Video Saliency Prediction in the Deep Learning Era , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  S. Pollak,et al.  Probabilistic learning of emotion categories. , 2019, Journal of experimental psychology. General.

[16]  Tim H. W. Cornelissen,et al.  A Validation of Automatically-Generated Areas-of-Interest in Videos of a Face for Eye-Tracking Research , 2018, Front. Psychol..

[17]  S. Yantis,et al.  Visual attention: control, representation, and time course. , 1997, Annual review of psychology.

[18]  Tal Hassner,et al.  FSGAN: Subject Agnostic Face Swapping and Reenactment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  S. Demleitner [Communication without words]. , 1997, Pflege aktuell.

[20]  Diego Gutierrez,et al.  Using eye-tracking to assess different image retargeting methods , 2011, APGV '11.

[21]  I. Gilchrist,et al.  Does narrative drive dynamic attention to a prolonged stimulus? , 2018, Cognitive research: principles and implications.

[22]  Heinrich H. Bülthoff,et al.  Evaluation of real-world and computer-generated stylized facial expressions , 2007, TAP.

[23]  Effie J. Pereira,et al.  The eyes do not have it after all? Attention is not automatically biased towards faces and eyes , 2019, Psychological Research.

[24]  Sumit Kumar Jha,et al.  Predicting Heart Rate Variations of Deepfake Videos using Neural ODE , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[25]  B. Mesquita,et al.  Context in Emotion Perception , 2011 .

[26]  J. Henderson,et al.  Do the eyes really have it? Dynamic allocation of attention when viewing moving faces. , 2012, Journal of vision.

[27]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[28]  Siwei Lyu,et al.  Exposing DeepFake Videos By Detecting Face Warping Artifacts , 2018, CVPR Workshops.

[29]  Andrew Chadwick,et al.  Deepfakes and Disinformation: Exploring the Impact of Synthetic Political Video on Deception, Uncertainty, and Trust in News , 2020, Social Media + Society.

[30]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Junichi Yamagishi,et al.  MesoNet: a Compact Facial Video Forgery Detection Network , 2018, 2018 IEEE International Workshop on Information Forensics and Security (WIFS).

[32]  Xiongkuo Min,et al.  Influence of compression artifacts on visual attention , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[33]  John M. Henderson,et al.  Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion , 2011, Cognitive Computation.

[34]  N. J. Cohen,et al.  Eye-movement-based memory effect: a reprocessing effect in face perception. , 1999, Journal of experimental psychology. Learning, memory, and cognition.

[35]  G. Rousselet,et al.  Is it an animal? Is it a human face? Fast processing in upright and inverted natural scenes. , 2003, Journal of vision.

[36]  Abhinav Dhall,et al.  The eyes know it: FakeET- An Eye-tracking Database to Understand Deepfake Perception , 2020, ICMI.

[37]  J. Fernández-Dols,et al.  Neutral faces in context: Their emotional meaning and their function , 1994 .

[38]  Andreas Rössler,et al.  FaceForensics++: Learning to Detect Manipulated Facial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  C. Collin,et al.  Faces elicit different scanning patterns depending on task demands , 2017, Attention, perception & psychophysics.

[40]  Edward J. Delp,et al.  Deepfake Video Detection Using Recurrent Neural Networks , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[41]  G. Kuhn,et al.  Why are you looking at me? It’s because I’m talking, but mostly because I’m staring or not doing much , 2018, Attention, perception & psychophysics.

[42]  B. Rossion,et al.  Fixation Patterns During Recognition of Personally Familiar and Unfamiliar Faces , 2010, Front. Psychology.

[43]  A. Kingstone,et al.  Human Social Attention , 2009, Annals of the New York Academy of Sciences.

[44]  Siwei Lyu,et al.  In Ictu Oculi: Exposing AI Created Fake Videos by Detecting Eye Blinking , 2018, 2018 IEEE International Workshop on Information Forensics and Security (WIFS).

[45]  Pia Rotshtein,et al.  Identification of Emotional Facial Expressions: Effects of Expression, Intensity, and Sex on Eye Gaze , 2016, PloS one.

[46]  Julian Fierrez,et al.  GANprintR: Improved Fakes and Evaluation of the State of the Art in Face Manipulation Detection , 2019, IEEE Journal of Selected Topics in Signal Processing.

[47]  Sebastian Bosse,et al.  Psychophysiology-Based QoE Assessment: A Survey , 2017, IEEE Journal of Selected Topics in Signal Processing.

[48]  G. Zelinsky Understanding scene understanding , 2013, Front. Psychol..

[49]  Wojciech Matusik,et al.  Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  D. Isaacowitz,et al.  Age effects and gaze patterns in recognising emotional expressions: An in-depth look at gaze measures and covariates , 2010 .

[51]  A. J. Fridlund,et al.  Facial Expressions , 2018, Encyclopedia of Evolutionary Psychological Science.

[52]  G. Alpers,et al.  Happy mouth and sad eyes: scanning emotional facial expressions. , 2011, Emotion.

[53]  Marcus A. Magnor,et al.  Comparative Analysis of Three Different Modalities for Perception of Artifacts in Videos , 2017, TAP.

[54]  Hans-Peter Seidel,et al.  Learning to Predict Localized Distortions in Rendered Images , 2013, Comput. Graph. Forum.

[55]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[56]  Xin Yang,et al.  Exposing Deep Fakes Using Inconsistent Head Poses , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[57]  O. Grüsser,et al.  Gaze motor asymmetries in the perception of faces during a memory task , 1993, Neuropsychologia.

[58]  Matthias Kohring,et al.  Mistrust, Disinforming News, and Vote Choice: A Panel Survey on the Origins and Consequences of Believing Disinformation in the 2017 German Parliamentary Election , 2020, Political Communication.

[59]  Hui Zhang,et al.  A Generalized and Robust Method Towards Practical Gaze Estimation on Smart Phone , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[60]  L. Nummenmaa,et al.  Eye-movement assessment of the time course in facial expression recognition: Neurophysiological implications , 2009, Cognitive, affective & behavioral neuroscience.

[61]  H. Bülthoff,et al.  The contribution of different facial regions to the recognition of conversational expressions. , 2008, Journal of vision.

[62]  Tal Hassner,et al.  On Face Segmentation, Face Swapping, and Face Perception , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[63]  Christian Wallraven,et al.  The semantic space for facial communication , 2014, Comput. Animat. Virtual Worlds.

[64]  Charissa R Lansing,et al.  Word identification and eye fixation locations in visual and visual-plus-auditory presentations of spoken sentences , 2003, Perception & psychophysics.

[65]  N. Helberger,et al.  Do (Microtargeted) Deepfakes Have Real Effects on Political Attitudes , 2020 .