论文信息 - Group-Level Emotion Recognition Using a Unimodal Privacy-Safe Non-Individual Approach

Group-Level Emotion Recognition Using a Unimodal Privacy-Safe Non-Individual Approach

This article presents our unimodal privacy-safe and non-individual proposal for the audio-video group emotion recognition subtask at the Emotion Recognition in the Wild (EmotiW) Challenge 2020. This sub challenge aims to classify in the wild videos into three categories: Positive, Neutral and Negative. Recent deep learning models have shown tremendous advances in analyzing interactions between people, predicting human behavior and affective evaluation. Nonetheless, their performance comes from individual-based analysis, which means summing up and averaging scores from individual detections, which inevitably leads to some privacy issues. In this research, we investigated a frugal approach towards a model able to capture the global moods from the whole image without using face or pose detection, or any individual-based feature as input. The proposed methodology mixes state-of-the-art and dedicated synthetic corpora as training sources. With an in-depth exploration of neural network architectures for group-level emotion recognition, we built a VGG-based model achieving 59.13% accuracy on the VGAF test set (eleventh place of the challenge). Given that the analysis is unimodal based only on global features and that the performance is evaluated on a real-world dataset, these results are promising and let us envision extending this model to multimodality for classroom ambiance evaluation, our final target application.

[1] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Elinor McKone,et al. Perceived emotion genuineness: normative ratings for popular facial expression stimuli and the development of perceived-as-genuine and perceived-as-fake sets , 2017, Behavior research methods.

[3] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Rosalind W. Picard. Affective computing: challenges , 2003, Int. J. Hum. Comput. Stud..

[7] M. Taquet,et al. Emotions in Everyday Life , 2015, PloS one.

[8] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[9] Cordelia Schmid,et al. Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Zhiyuan Li,et al. Group-Level Emotion Recognition using Deep Models with A Four-stream Hybrid Network , 2018, ICMI.

[11] Yinda Zhang,et al. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[12] Sigal G. Barsade,et al. Group Affect , 2012 .

[13] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[14] Natalie C. Ebner,et al. FACES—A database of facial expressions in young, middle-aged, and older women and men: Development and validation , 2010, Behavior research methods.

[15] Zijian Zhang,et al. Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16] Damien Dupré,et al. A performance comparison of eight commercially available automatic classifiers for facial affect recognition , 2020, PloS one.

[17] Joseph Howse. OpenCV computer vision with Python : learn to capture videos, manipulate images, and track objects with Python using the OpenCV Library , 2013 .

[18] Garima Sharma,et al. Automatic Group Level Affect and Cohesion Prediction in Videos , 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW).

[19] Jennifer LoCasale-Crouch,et al. Toward Automated Classroom Observation: Predicting Positive and Negative Climate , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[20] L. Leyman,et al. The Karolinska Directed Emotional Faces: A validation study , 2008 .

[21] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[22] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Bin Zhu,et al. Group-Level Emotion Recognition Using Hybrid Deep Models Based on Faces, Scenes, Skeletons and Visual Attentions , 2018, ICMI.

[24] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Jun Du,et al. Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition , 2019, ICMI.

[26] Yu Qiao,et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[27] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] P. Geetha,et al. Facial emotion detection using modified eyemap–mouthmap algorithm on an enhanced image and classification with tensorflow , 2019, The Visual Computer.

[29] Abhinav Dhall,et al. EmotiW 2020: Driver Gaze, Group Emotion, Student Engagement and Physiological Signal based Challenges , 2020, ICMI.

[30] WangYunhong,et al. U-Net Conditional GANs for Photo-Realistic and Identity-Preserving Facial Expression Synthesis , 2020 .

[31] Cheng Lu,et al. Bi-modality Fusion for Emotion Recognition in the Wild , 2019, ICMI.

[32] Zengchang Qin,et al. Generative Cooperative Net for Image Generation and Data Augmentation , 2019, IUKM.

[33] Shiguo Lian,et al. A survey on face data augmentation for the training of deep neural networks , 2019, Neural Computing and Applications.

[34] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[35] Dominique Vaufreydaz,et al. Ethical Teaching Analytics in a Context-Aware Classroom: A Manifesto , 2020, ERCIM News.

[36] Franceska Xhakaj,et al. EduSense: Practical Classroom Sensing at Scale , 2019, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[37] Kai Wang,et al. Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues , 2018, ICMI.

[38] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[39] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[40] Jesse Hoey,et al. EmotiW 2016: video and group-level emotion recognition challenges , 2016, ICMI.

[41] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[42] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Björn Schuller,et al. Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[44] Zheru Chi,et al. Smile detection in the wild with deep convolutional neural networks , 2017, Machine Vision and Applications.

[45] Yunhong Wang,et al. U-Net Conditional GANs for Photo-Realistic and Identity-Preserving Facial Expression Synthesis , 2019 .

[46] FuHong,et al. Smile detection in the wild with deep convolutional neural networks , 2017, MVA 2017.

[47] Justin Dauwels,et al. Automated Classification of Classroom Climate by Audio Analysis , 2018, International Workshop on Spoken Dialogue Systems Technology.