Analyzing the Affect of a Group of People Using Multi-modal Framework

Millions of images on the web enable us to explore images from social events such as a family party, thus it is of interest to understand and model the affect exhibited by a group of people in images. But analysis of the affect expressed by multiple people is challenging due to varied indoor and outdoor settings, and interactions taking place between various numbers of people. A few existing works on Group-level Emotion Recognition (GER) have investigated on face-level information. Due to the challenging environments, face may not provide enough information to GER. Relatively few studies have investigated multi-modal GER. Therefore, we propose a novel multi-modal approach based on a new feature description for understanding emotional state of a group of people in an image. In this paper, we firstly exploit three kinds of rich information containing face, upperbody and scene in a group-level image. Furthermore, in order to integrate multiple person's information in a group-level image, we propose an information aggregation method to generate three features for face, upperbody and scene, respectively. We fuse face, upperbody and scene information for robustness of GER against the challenging environments. Intensive experiments are performed on two challenging group-level emotion databases to investigate the role of face, upperbody and scene as well as multi-modal framework. Experimental results demonstrate that our framework achieves very promising performance for GER.

[1]  C. Zetzsche,et al.  Fundamental limits of linear filters in the visual processing of two-dimensional signals , 1990, Vision Research.

[2]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Roland Göcke,et al.  Relative Body Parts Movement for Automatic Depression Analysis , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[4]  Arman Savran,et al.  Temporal Bayesian Fusion for Affect Sensing: Combining Video, Audio, and Lexical Modalities , 2015, IEEE Transactions on Cybernetics.

[5]  Roland Göcke,et al.  Finding Happiest Moments in a Social Context , 2012, ACCV.

[6]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Javier Hernandez,et al.  Mood meter: counting smiles in the wild , 2012, UbiComp.

[8]  Wenxuan Mou,et al.  Group-level arousal and valence recognition in static images: Face, body and context , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[9]  Lionel Prevost,et al.  Facial Action Recognition Combining Heterogeneous Features via Multikernel Learning , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Daniel McDuff,et al.  Event detection: Ultra large-scale clustering of facial expressions , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[11]  Sigal G. Barsade,et al.  Mood and Emotions in Small Groups and Work Teams , 2001 .

[12]  Ville Ojansivu,et al.  Blur Insensitive Texture Classification Using Local Phase Quantization , 2008, ICISP.

[13]  Andrew Zisserman,et al.  Fisher Vector Faces in the Wild , 2013, BMVC.

[14]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Kostas Karpouzis,et al.  Fusing Visual and Behavioral Cues for Modeling User Experience in Games , 2013, IEEE Transactions on Cybernetics.

[16]  Vittorio Ferrari,et al.  Better Appearance Models for Pictorial Structures , 2009, BMVC.

[17]  Roland Göcke,et al.  Facial Expression Based Automatic Album Creation , 2010, ICONIP.

[18]  Maja Pantic,et al.  A Dynamic Appearance Descriptor Approach to Facial Actions Temporal Modeling , 2014, IEEE Transactions on Cybernetics.

[19]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[20]  Tamás D. Gedeon,et al.  Automatic Group Happiness Intensity Analysis , 2015, IEEE Transactions on Affective Computing.

[21]  Hatice Gunes,et al.  Automatic Temporal Segment Detection and Affect Recognition From Face and Body Display , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[23]  Ethem Alpaydin,et al.  Localized Multiple Kernel Regression , 2010, 2010 20th International Conference on Pattern Recognition.

[24]  Matti Pietikäinen,et al.  Riesz-based Volume Local Binary Pattern and A Novel Group Expression Model for Group Happiness Intensity Analysis , 2015, BMVC.

[25]  Gwen Littlewort,et al.  Multiple kernel learning for emotion recognition in the wild , 2013, ICMI '13.

[26]  Andrew C. Gallagher,et al.  Understanding images of groups of people , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Hongyu Li,et al.  Encoding local image patterns using Riesz transforms: With applications to palmprint and finger-knuckle-print recognition , 2012, Image Vis. Comput..

[28]  Ethem Alpaydin,et al.  Localized multiple kernel learning , 2008, ICML '08.

[29]  Nicu Sebe,et al.  The more the merrier: Analysing the affect of a group of people in images , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[30]  PietikainenMatti,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007 .

[31]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[32]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34]  Sigal G. Barsade,et al.  Group emotion: A view from top and bottom. , 1998 .

[35]  Xuelong Li,et al.  Image Classification With Densely Sampled Image Windows and Generalized Adaptive Multiple Kernel Learning , 2015, IEEE Transactions on Cybernetics.

[36]  Alberto Del Bimbo,et al.  Fisher Encoded Convolutional Bag-of-Windows for Efficient Image Retrieval and Social Image Tagging , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[37]  Tian-Sheuan Chang,et al.  Fast SIFT Design for Real-Time Visual Feature Extraction , 2013, IEEE Transactions on Image Processing.

[38]  Christian Vollmer,et al.  Estimation of human upper body orientation for mobile robotics using an SVM decision tree on monocular images , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[39]  Hongying Meng,et al.  Affective State Level Recognition in Naturalistic Facial and Vocal Expressions , 2014, IEEE Transactions on Cybernetics.

[40]  Michael Felsberg,et al.  The monogenic signal , 2001, IEEE Trans. Signal Process..

[41]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  Zhengqin Li,et al.  Superpixel segmentation using Linear Spectral Clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).