Investigation of Small Group Social Interactions Using Deep Visual Activity-Based Nonverbal Features

Understanding small group face-to-face interactions is a prominent research problem for social psychology while the automatic realization of it recently became popular in social computing. This is mainly investigated in terms of nonverbal behaviors, as they are one of the main facet of communication. Among several multi-modal nonverbal cues, visual activity is an important one and its sufficiently good performance can be crucial for instance, when the audio sensors are missing. The existing visual activity-based nonverbal features, which are all hand-crafted, were able to perform well enough for some applications while did not perform well for some other problems. Given these observations, we claim that there is a need of more robust feature representations, which can be learned from data itself. To realize this, we propose a novel method, which is composed of optical flow computation, deep neural network based feature learning, feature encoding and classification. Additionally, a comprehensive analysis between different feature encoding techniques is also presented. The proposed method is tested on three research topics, which can be perceived during small group interactions i.e. meetings: i) emergent leader detection, ii) emergent leadership style prediction, and iii) high/low extraversion classification. The proposed method shows (significantly) better results not only as compared to the state of the art visual activity based-nonverbal features but also when the state of the art visual activity based-nonverbal features are combined with other audio-based and video-based nonverbal features.

[1]  Daniel Gatica-Perez,et al.  One of a kind: inferring personality impressions in meetings , 2013, ICMI '13.

[2]  J. Odobez,et al.  Nonverbal behavior analysis , 2014 .

[3]  Vittorio Murino,et al.  Moving as a Leader: Detecting Emergent Leadership in Small Groups using Body Pose , 2017, ACM Multimedia.

[4]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[5]  Amir Muaremi,et al.  Discriminating Individually Considerate and Authoritarian Leaders by Speech Activity Cues , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[6]  Daniel Gatica-Perez,et al.  Detection and application of influence rankings in small group meetings , 2006, ICMI '06.

[7]  Vittorio Murino,et al.  Identification of emergent leaders in a meeting scenario using multiple kernel learning , 2016, ASSP4MI '16.

[8]  Subramanian Ramanathan,et al.  Connecting Meeting Behavior with Extraversion—A Systematic Study , 2012, IEEE Transactions on Affective Computing.

[9]  Fatih Murat Porikli,et al.  Covariance Tracking using Model Update Based on Lie Algebra , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[11]  Gerhard Tröster,et al.  Detecting Posture Mirroring in Social Interactions with Wearable Sensors , 2011, 2011 15th Annual International Symposium on Wearable Computers.

[12]  Daniel Gatica-Perez,et al.  The YouTube Lens: Crowdsourced Personality Impressions and Audiovisual Analysis of Vlogs , 2013, IEEE Transactions on Multimedia.

[13]  Daniel Gatica-Perez,et al.  Mining large-scale smartphone data for personality studies , 2013, Personal and Ubiquitous Computing.

[14]  M. Taccetta-Chapnick Transformational leadership. , 1996, Nursing administration quarterly.

[15]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Vittorio Murino,et al.  Predicting online lecture ratings based on gesturing and vocal behavior , 2014, Journal on Multimodal User Interfaces.

[17]  Kyriaki Kalimeri,et al.  Modeling dominance effects on nonverbal behaviors using granger causality , 2012, ICMI '12.

[18]  Steve Renals,et al.  Automatic Meeting Segmentation Using Dynamic Bayesian Networks , 2007, IEEE Transactions on Multimedia.

[19]  Antonio Origlia,et al.  From speech to personality: mapping voice quality and intonation into personality differences , 2012, ACM Multimedia.

[20]  Alex Pentland,et al.  Automatic Modeling of Dominance Effects Using Granger Causality , 2011, HBU.

[21]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[22]  Daniel Gatica-Perez,et al.  Identifying emergent leadership in small groups using nonverbal communicative cues , 2010, ICMI-MLMI '10.

[23]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[24]  Steve Renals,et al.  Multimodal Integration for Meeting Group Action Segmentation and Recognition , 2005, MLMI.

[25]  Fatih Murat Porikli,et al.  Human Detection via Classification on Riemannian Manifolds , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[27]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[28]  Daniel Gatica-Perez,et al.  Emergent leaders through looking and speaking: from audio-visual data to multimodal recognition , 2012, Journal on Multimodal User Interfaces.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Robert B. Fisher,et al.  Detection of Abnormal Fish Trajectories Using a Clustering Based Hierarchical Classifier , 2013, BMVC.

[31]  Alessandro Vinciarelli,et al.  Automatic role recognition in multiparty recordings using social networks and probabilistic sequential models , 2009, ACM Multimedia.

[32]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[33]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Hiroshi Murase,et al.  Quantifying interpersonal influence in face-to-face conversations based on visual attention patterns , 2006, CHI Extended Abstracts.

[35]  Vittorio Murino,et al.  Detecting emergent leader in a meeting environment using nonverbal visual features only , 2016, ICMI.

[36]  Gerhard Tröster,et al.  Quantifying Behavioral Mimicry by Automatic Detection of Nonverbal Cues from Body Motion , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[37]  Alessandro Vinciarelli,et al.  Role recognition in multiparty recordings using social affiliation networks and discrete distributions , 2008, ICMI '08.

[38]  Vittorio Murino,et al.  Prediction of the Leadership Style of an Emergent Leader Using Audio and Visual Nonverbal Features , 2018, IEEE Transactions on Multimedia.

[39]  Kyriaki Kalimeri,et al.  Honest Signals and Their Contribution to the Automatic Analysis of Personality Traits - A Comparative Study , 2010, HBU.

[40]  Daniel Gatica-Perez,et al.  Personality Trait Classification via Co-Occurrent Multiparty Multimodal Event Discovery , 2015, ICMI.

[41]  Emad Barsoum,et al.  Emotion recognition in the wild from videos using images , 2016, ICMI.

[42]  Ethem Alpaydin,et al.  Localized multiple kernel learning , 2008, ICML '08.

[43]  Samy Bengio,et al.  Detecting group interest-level in meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[44]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[45]  Janusz Konrad,et al.  Action Recognition Using Sparse Representation on Covariance Manifolds of Optical Flow , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[46]  David W. Johnson,et al.  Joining Together: Group Theory and Group Skills , 1975 .

[47]  Daniel Gatica-Perez,et al.  Mining Group Nonverbal Conversational Patterns Using Probabilistic Topic Models , 2010, IEEE Transactions on Multimedia.

[48]  Hervé Bourlard,et al.  Automatic social role recognition in professional meetings using conditional random fields , 2013, INTERSPEECH.

[49]  Lale Akarun,et al.  Multi-domain and multi-task prediction of extraversion and leadership from meeting videos , 2017, EURASIP J. Image Video Process..

[50]  D. Gática-Pérez,et al.  A Nonverbal Behavior Approach to Identify Emergent Leaders in Small Groups , 2012, IEEE Transactions on Multimedia.

[51]  Xuelong Li,et al.  Gabor-Based Region Covariance Matrices for Face Recognition , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[52]  Samy Bengio,et al.  Automatic analysis of multimodal group actions in meetings , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Andrei Popescu-Belis,et al.  Nonverbal Behavior Analysis , 2014 .

[54]  Xi Wang,et al.  Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification , 2015, ACM Multimedia.

[55]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[56]  Jean-Marc Odobez,et al.  Investigating automatic dominance estimation in groups from visual attention and speaking activity , 2008, ICMI '08.

[57]  Alex Pentland,et al.  Using the influence model to recognize functional roles in meetings , 2007, ICMI '07.

[58]  Fabio Valente,et al.  Understanding social signals in multi-party conversations: Automatic recognition of socio-emotional roles in the AMI meeting corpus , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[59]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[60]  Nadia Mana,et al.  Multimodal recognition of personality traits in social interactions , 2008, ICMI '08.

[61]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Daniel Gatica-Perez,et al.  FaceTube: predicting personality from facial expressions of emotion in online conversational video , 2012, ICMI '12.

[63]  Stephen J. Maybank,et al.  Human Action Recognition under Log-Euclidean Riemannian Metric , 2009, ACCV.

[64]  Nadia Bianchi-Berthouze,et al.  Modeling human affective postures: an information theoretic characterization of posture features , 2004, Comput. Animat. Virtual Worlds.