Non-Volume Preserving-based Feature Fusion Approach to Group-Level Expression Recognition on Crowd Videos

Group-level emotion recognition (ER) is a growing research area as the demands for assessing crowds of all sizes is becoming an interest in both the security arena and social media. This work investigates group-level expression recognition on crowd videos where information is not only aggregated across a variable length sequence of frames but also over the set of faces within each frame to produce aggregated recognition results. In this paper, we propose an effective deep feature level fusion mechanism to model the spatial-temporal information in the crowd videos. Furthermore, we extend our proposed NVP fusion mechanism to temporal NVP fussion appoarch to learn the temporal information between frames. In order to demonstrate the robustness and effectiveness of each component in the proposed approach, three experiments were conducted: (i) evaluation on the AffectNet database to benchmark the proposed emoNet for recognizing facial expression; (ii) evaluation on EmotiW2018 to benchmark the proposed deep feature level fusion mechanism NVPF; and, (iii) examine the proposed TNVPF on an innovative Group-level Emotion on Crowd Videos (GECV) dataset composed of 627 videos collected from social media. GECV dataset is a collection of videos ranging in duration from 10 to 20 seconds of crowds of twenty (20) or more subjects and each video is labeled as positive, negative, or neutral.

[1]  Victor O. K. Li,et al.  Video-based Emotion Recognition Using Deeply-Supervised Neural Networks , 2018, ICMI.

[2]  Ngoc Thang Vu,et al.  Densely Connected Convolutional Networks for Speech Recognition , 2018, ITG Symposium on Speech Communication.

[3]  Kai Wang,et al.  Group emotion recognition with individual facial emotion CNNs and global image based CNNs , 2017, ICMI.

[4]  Christopher Joseph Pal,et al.  Recurrent Neural Networks for Emotion Recognition in Video , 2015, ICMI.

[5]  Shiguang Shan,et al.  Combining Multiple Kernel Methods on Riemannian Manifold for Emotion Recognition in the Wild , 2014, ICMI.

[6]  Christopher Joseph Pal,et al.  Facial Expression Analysis Based on High Dimensional Binary Features , 2014, ECCV Workshops.

[7]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[8]  Yuanliu Liu,et al.  Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[9]  Xin Guo,et al.  Group-level emotion recognition using deep models on image scene, faces, and skeletons , 2017, ICMI.

[10]  Mohammad H. Mahoor,et al.  AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild , 2017, IEEE Transactions on Affective Computing.

[11]  Bo Sun,et al.  A new deep-learning framework for group emotion recognition , 2017, ICMI.

[12]  Andrey V. Savchenko,et al.  Group-level emotion recognition using transfer learning from face identification , 2017, ICMI.

[13]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[16]  Razvan Pascanu,et al.  Combining modality specific deep neural networks for emotion recognition in video , 2013, ICMI '13.

[17]  Ping Hu,et al.  Learning supervised scoring ensemble for emotion recognition in the wild , 2017, ICMI.

[18]  Tamás D. Gedeon,et al.  EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction , 2018, ICMI.

[19]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[20]  Zhiyuan Li,et al.  Group-Level Emotion Recognition using Deep Models with A Four-stream Hybrid Network , 2018, ICMI.

[21]  Nicu Sebe,et al.  The more the merrier: Analysing the affect of a group of people in images , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[22]  Stephan K. Chalup,et al.  Group emotion recognition in the wild by combining deep neural networks for facial expression classification and scene-context analysis , 2017, ICMI.

[23]  Marios Savvides,et al.  Temporal Non-volume Preserving Approach to Facial Age-Progression and Age-Invariant Face Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Yang Liu,et al.  MobileFaceNets: Efficient CNNs for Accurate Real-time Face Verification on Mobile Devices , 2018, CCBR.

[25]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Jose Dolz,et al.  An Attention Model for Group-Level Emotion Recognition , 2018, ICMI.

[28]  Emad Barsoum,et al.  Emotion recognition in the wild from videos using images , 2016, ICMI.

[29]  Natalia Efremova,et al.  Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video , 2017, ArXiv.

[30]  Aleix M. Martínez,et al.  EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).