Smile Intensity Detection in Multiparty Interaction using Deep Learning

Emotion expression recognition is an important aspect for enabling decision making in autonomous agents and systems designed to interact with humans. In this paper, we present our experience in developing a software component for smile intensity detection for multiparty interaction. First, the deep learning architecture and training process is described in detail. This is followed by analysis of the results obtained from testing the trained network. Finally, we outline the steps taken to implement and visualize this network in a real-time software component.

[1]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[2]  P. Salovey,et al.  Emotion regulation abilities and the quality of social interaction. , 2005, Emotion.

[3]  Albert Ali Salah,et al.  Are You Really Smiling at Me? Spontaneous versus Posed Enjoyment Smiles , 2012, ECCV.

[4]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[5]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[6]  Maria Pateraki,et al.  Two people walk into a bar: dynamic multi-party social interaction with a robot agent , 2012, ICMI '12.

[7]  Honglak Lee,et al.  Deep learning for robust feature generation in audiovisual emotion recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Keiji Shimada,et al.  Fast and Robust Smile Intensity Estimation by Cascaded Support Vector Machines , 2013 .

[9]  Niall Murray,et al.  Continuous affect prediction using eye gaze and speech , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[10]  Jean Carletta,et al.  The AMI meeting corpus , 2005 .

[11]  Gwen Littlewort,et al.  Toward Practical Smile Detection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[13]  Fernando De la Torre,et al.  Estimating smile intensity: A better way , 2015, Pattern Recognit. Lett..

[14]  David R. Traum,et al.  Embodied agents for multi-party dialogue in immersive virtual worlds , 2002, AAMAS '02.

[15]  Stefanos Zafeiriou,et al.  Aff-Wild2: Extending the Aff-Wild Database for Affect Recognition , 2018, ArXiv.

[16]  Ginevra Castellano,et al.  Incremental Acquisition and Reuse of Multimodal Affective Behaviors in a Conversational Agent , 2018, HAI.

[17]  Subramanian Ramanathan,et al.  SALSA: A Novel Dataset for Multimodal Group Behavior Analysis , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yong-Jin Liu,et al.  Real-Time Movie-Induced Discrete Emotion Recognition from EEG Signals , 2018, IEEE Transactions on Affective Computing.

[19]  Albert Ali Salah,et al.  Recognition of Genuine Smiles , 2015, IEEE Transactions on Multimedia.

[20]  Alessandro Vinciarelli,et al.  New Social Signals in a New Interaction World: The Next Frontier for Social Signal Processing , 2015, IEEE Systems, Man, and Cybernetics Magazine.

[21]  Rada Mihalcea,et al.  MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations , 2018, ACL.

[22]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[23]  Peter Wittenburg,et al.  ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Hatice Gunes,et al.  Affect recognition from face and body: early fusion vs. late fusion , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[26]  David Lee,et al.  Distinguishing Posed and Spontaneous Smiles by Facial Dynamics , 2016, ACCV Workshops.

[27]  Roland Göcke,et al.  Group expression intensity estimation in videos via Gaussian Processes , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[28]  Maja Pantic,et al.  Fully Automatic Facial Action Unit Detection and Temporal Analysis , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[29]  Mari Ostendorf,et al.  Directions For Multi-Party Human-Computer Interaction Research , 2003, HLT-NAACL 2003.

[30]  Nicu Sebe,et al.  Emotion Recognition Based on Joint Visual and Audio Cues , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[31]  Thierry Dutoit,et al.  A Dyadic Conversation Dataset on Moral Emotions , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[32]  Guoying Zhao,et al.  Aff-Wild: Valence and Arousal ‘In-the-Wild’ Challenge , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Christopher Joseph Pal,et al.  EmoNets: Multimodal deep learning approaches for emotion recognition in video , 2015, Journal on Multimodal User Interfaces.

[34]  J. Campos,et al.  Emergent Themes in the Study of Emotional Development and Emotion Regulation. , 1989 .

[35]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[36]  D. Wegner,et al.  What do I think you're doing? Action identification and mind attribution. , 2006, Journal of personality and social psychology.

[37]  Björn W. Schuller,et al.  Low-Level Fusion of Audio, Video Feature for Multi-Modal Emotion Recognition , 2008, VISAPP.

[38]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[39]  Javier Lorenzo-Navarro,et al.  Smile Detection for User Interfaces , 2008, ISVC.

[40]  Ian R. Fasel,et al.  Developing a Practical Smile Detector , 2007 .

[41]  Tamás D. Gedeon,et al.  Automatic Group Happiness Intensity Analysis , 2015, IEEE Transactions on Affective Computing.