Detecting Deception from Gaze and Speech Using a Multimodal Attention LSTM-Based Framework

The automatic detection of deceptive behaviors has recently attracted the attention of the research community due to the variety of areas where it can play a crucial role, such as security or criminology. This work is focused on the development of an automatic deception detection system based on gaze and speech features. The first contribution of our research on this topic is the use of attention Long Short-Term Memory (LSTM) networks for single-modal systems with frame-level features as input. In the second contribution, we propose a multimodal system that combines the gaze and speech modalities into the LSTM architecture using two different combination strategies: Late Fusion and Attention-Pooling Fusion. The proposed models are evaluated over the Bag-of-Lies dataset, a multimodal database recorded in real conditions. On the one hand, results show that attentional LSTM networks are able to adequately model the gaze and speech feature sequences, outperforming a reference Support Vector Machine (SVM)-based system with compact features. On the other hand, both combination strategies produce better results than the single-modal systems and the multimodal reference system, suggesting that gaze and speech modalities carry complementary information for the task of deception detection that can be effectively exploited by using LSTMs.

[1]  James J. Lindsay,et al.  Cues to deception. , 2003, Psychological bulletin.

[2]  Andreas Stolcke,et al.  Distinguishing deceptive from non-deceptive speech , 2005, INTERSPEECH.

[3]  Richa Singh,et al.  Bag-of-Lies: A Multimodal Dataset for Deception Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Hugo Jair Escalante,et al.  High-Level Features for Multimodal Deception Detection in Videos , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Mohamed Abouelenien,et al.  Detecting Deceptive Behavior via Integration of Discriminative Features From Multiple Modalities , 2017, IEEE Transactions on Information Forensics and Security.

[6]  Ruiyu Liang,et al.  Convolutional Bidirectional Long Short-Term Memory for Deception Detection With Acoustic Features , 2018, IEEE Access.

[7]  Karol J. Piczak Environmental sound classification with convolutional neural networks , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).

[8]  Matthew L. Jensen,et al.  Deception detection through automatic, unobtrusive analysis of nonverbal behavior , 2005, IEEE Intelligent Systems.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Matias Garcia-Constantino,et al.  Attention-Inspired Artificial Neural Networks for Speech Processing: A Systematic Review , 2021, Symmetry.

[11]  Keeley A. Crockett,et al.  Deception in the eyes of deceiver: A computer vision and machine learning based automated deception detection , 2020, Expert Syst. Appl..

[12]  Colin Camerer,et al.  Pinocchio's Pupil: Using Eyetracking and Pupil Dilation to Understand Truth-Telling and Deception in Sender-Receiver Game , 2009 .

[13]  Che-Wei Huang,et al.  Attention Assisted Discovery of Sub-Utterance Structure in Speech Emotion Recognition , 2016, INTERSPEECH.

[14]  Mohamed Abouelenien,et al.  Verbal and Nonverbal Clues for Real-life Deception Detection , 2015, EMNLP.

[15]  Ming Sun,et al.  A Comparison of Pooling Methods on LSTM Models for Rare Acoustic Event Classification , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Ascensión Gallardo-Antolín,et al.  Automatic Detection of Depression in Speech Using Ensemble Convolutional Neural Networks , 2020, Entropy.

[17]  V. Vapnik,et al.  A note one class of perceptrons , 1964 .

[18]  Murat Kantarcioglu,et al.  MultiModal Deception Detection: Accuracy, Applicability and Generalizability* , 2020, 2020 Second IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA).

[19]  Juan Manuel Montero-Martínez,et al.  A Saliency-Based Attention LSTM Model for Cognitive Load Classification from Speech , 2019, INTERSPEECH.

[20]  Juan Manuel Montero-Martínez,et al.  On combining acoustic and modulation spectrograms in an attention LSTM-based system for speech intelligibility level classification , 2021, Neurocomputing.

[21]  P. Mermelstein,et al.  Distance measures for speech recognition, psychological and instrumental , 1976 .

[22]  Ascensión Gallardo-Antolín,et al.  An attention Long Short-Term Memory based system for automatic classification of speech intelligibility , 2020, Eng. Appl. Artif. Intell..

[23]  Abeer Alwan,et al.  Attention Based CLDNNs for Short-Duration Acoustic Scene Classification , 2017, INTERSPEECH.

[24]  Juan Manuel Montero-Martínez,et al.  External Attention LSTM Models for Cognitive Load Classification from Speech , 2019, SLSP.

[25]  Rubén San-Segundo,et al.  Parkinson’s Disease Detection from Drawing Movements Using Convolutional Neural Networks , 2019, Electronics.

[26]  Radu Danescu,et al.  In the Eye of the Deceiver: Analyzing Eye Movements as a Cue to Deception , 2018, J. Imaging.

[27]  Seyedmahdad Mirsamadi,et al.  Automatic speech emotion recognition using recurrent neural networks with local attention , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Suramya Tomar,et al.  Converting video formats with FFmpeg , 2006 .

[29]  Sherali Zeadally,et al.  Online deception in social media , 2014, Commun. ACM.

[30]  Aldert Vrij,et al.  Saccadic eye movement rate as a cue to deceit , 2015 .

[31]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[32]  K. Fukuda,et al.  Eye blinks: new indices for the detection of deception. , 2001, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.