Affect-Aware Deep Belief Network Representations for Multimodal Unsupervised Deception Detection

Automated systems that detect the social behavior of deception can enhance human well-being across medical, social work, and legal domains. Labeled datasets to train supervised deception detection models can rarely be collected for real-world, high -stakes contexts. To address this challenge, we propose the first unsupervised approach for detecting realworld, high-stakes deception in videos without requiring labels. This paper presents our novel approach for affect-aware unsupervised Deep Belief Networks (DBN) to learn discriminative representations of deceptive and truthful behavior. Drawing on psychology theories that link affect and deception, we experimented with unimodal and multimodal DBN-based approaches trained on facial valence, facial arousal, audio, and visual features. In addition to using facial affect as a feature on which DBN models are trained, we also introduce a DBN training procedure that uses facial affect as an aligner of audio-visual representations. We conducted classification experiments with unsupervised Gaussian Mixture Model clustering to evaluate our approaches. Our best unsupervised approach (trained on facial valence and visual features) achieved an AVC of 80%, outperforming human ability and performing comparably to fully-supervised models. Our results motivate future work on unsupervised, affect-aware computational approaches for detecting deception and other social behaviors in the wild.

[1]  Leena Mathur,et al.  Introducing Representations of Facial Affect in Automated Multimodal Deception Detection , 2020, ICMI.

[2]  Mohamed Daoudi,et al.  Unsupervised Learning Method for Exploring Students' Mental Stress in Medical Simulation Training , 2020, ICMI Companion.

[3]  Ruslan Salakhutdinov,et al.  Learning Deep Generative Models , 2009 .

[4]  Honglak Lee,et al.  Deep learning for robust feature generation in audiovisual emotion recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Driss Matrouf,et al.  Iterative Bayesian and MMSE-based noise compensation techniques for speaker recognition in the i-vector space , 2016, Odyssey.

[6]  Kai Feng,et al.  SUBSPACE GAUSSIAN MIXTURE MODELS FOR SPEECH RECOGNITION , 2009 .

[7]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Geoffrey E. Hinton,et al.  An Efficient Learning Procedure for Deep Boltzmann Machines , 2012, Neural Computation.

[9]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[10]  A. Vrij,et al.  Police officers', social workers', teachers' and the general public's beliefs about deception in children, adolescents and adults. , 2006 .

[11]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[12]  M. Zuckerman Verbal and nonverbal communication of deception , 1981 .

[13]  Rajiv Bajpai,et al.  The Truth and Nothing But the Truth: Multimodal Analysis for Deception Detection , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[14]  Alex Pentland,et al.  Social signal processing: state-of-the-art and future perspectives of an emerging domain , 2008, ACM Multimedia.

[15]  J. Forgas Mood and judgment: the affect infusion model (AIM). , 1995, Psychological bulletin.

[16]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[17]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[18]  Charles Elkan,et al.  Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[19]  Larry S. Davis,et al.  Deception Detection in Videos , 2017, AAAI.

[20]  Guoying Zhao,et al.  Deep Affect Prediction in-the-Wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond , 2018, International Journal of Computer Vision.

[21]  Shrikanth S. Narayanan,et al.  Identifying Truthful Language in Child Interviews , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  J. Russell A circumplex model of affect. , 1980 .

[23]  Jing Huang,et al.  Audio-visual deep learning for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Angeliki Metallinou,et al.  Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[25]  V. Hasselblad Estimation of parameters for a mixture of normal distributions , 1966 .

[26]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Mohamed Abouelenien,et al.  Multimodal Deception Detection Using Real-Life Trial Data , 2022, IEEE Transactions on Affective Computing.

[28]  P. Ekman,et al.  Nonverbal Leakage and Clues to Deception †. , 1969, Psychiatry.

[29]  Mohamed Abouelenien,et al.  Deception Detection using Real-life Trial Data , 2015, ICMI.

[30]  Luigi Cinque,et al.  Automatic Deception Detection in RGB videos using Facial Action Units , 2019, ICDSC.

[31]  Ling Shao,et al.  Multimodal Dynamic Networks for Gesture Recognition , 2014, ACM Multimedia.

[32]  Matthew J. Hertenstein,et al.  Nonverbal channel use in communication of emotion: how may depend on why. , 2011, Emotion.

[33]  Erik Marchi,et al.  Is Deception Emotional? An Emotion-Driven Predictive Approach , 2016, INTERSPEECH.

[34]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Leena Mathur,et al.  Unsupervised Audio-Visual Subspace Alignment for High-Stakes Deception Detection , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[38]  Mohamed Abouelenien,et al.  Multimodal deception detection , 2018, The Handbook of Multimodal-Multisensor Interfaces, Volume 2.

[39]  Hugo Jair Escalante,et al.  High-Level Features for Multimodal Deception Detection in Videos , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[40]  Zhiwu Lu,et al.  Face-Focused Cross-Stream Network for Deception Detection in Videos , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[42]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[43]  Rosalind W. Picard Affective Computing , 1997 .

[44]  Akane Sano,et al.  Multimodal autoencoder: A deep learning approach to filling in missing sensor data and enabling better mood prediction , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII).

[45]  S. Porter,et al.  The truth about lies: What works in detecting high‐stakes deception? , 2010 .

[46]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[47]  Xiaogang Wang,et al.  Multi-source Deep Learning for Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  P. Ekman,et al.  The ability to detect deceit generalizes across different types of high-stake lies. , 1997, Journal of personality and social psychology.

[49]  B. Depaulo,et al.  Accuracy of Deception Judgments , 2006, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[50]  Jiliang Tang,et al.  Toward End-to-End Deception Detection in Videos , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[51]  Guoying Zhao,et al.  Aff-Wild: Valence and Arousal ‘In-the-Wild’ Challenge , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[52]  Goran Glavaš,et al.  Non-Linear Instance-Based Cross-Lingual Mapping for Non-Isomorphic Embedding Spaces , 2020, ACL.

[53]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[54]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[55]  Guoying Zhao,et al.  Recognition of Affect in the Wild Using Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).