Improved Visual Focus of Attention Estimation and Prosodic Features for Analyzing Group Interactions

Collaborative group tasks require efficient and productive verbal and non-verbal interactions among the participants. Studying such interaction patterns could help groups perform more efficiently, but the detection and measurement of human behavior is challenging since it is inherently multimodal and changes on a millisecond time frame. In this paper, we present a method to study groups performing a collaborative decision-making task using non-verbal behavioral cues. First, we present a novel algorithm to estimate the visual focus of attention (VFOA) of participants using frontal cameras. The algorithm can be used in various group settings, and performs with a state-of-the-art accuracy of 90%. Secondly, we present prosodic features for non-verbal speech analysis. These features are commonly used in speech/music classification tasks, but are rarely used in human group interaction analysis. We validate our algorithms on a multimodal dataset of 14 group meetings with 45 participants, and show that a combination of VFOA-based visual metrics and prosodic-feature-based metrics can predict emergent group leaders with 64% accuracy and dominant contributors with 86% accuracy. We also report our findings on the correlations between the non-verbal behavioral metrics with gender, emotional intelligence, and the Big 5 personality traits.

[1]  Susan L. Kichuk,et al.  The big five personality factors and team performance: implications for selecting successful product design teams , 1997 .

[2]  J. Burgoon,et al.  Nonverbal Communication , 2018, Encyclopedia of Evolutionary Psychological Science.

[3]  A. Gray,et al.  A spectral-flatness measure for studying the autocorrelation method of linear prediction of speech analysis , 1974 .

[4]  Jay Hall,et al.  The Effects of a Normative Intervention on Group Decision-Making Performance , 1970 .

[5]  Heng Ji,et al.  A Multimodal-Sensor-Enabled Room for Unobtrusive Group Meeting Analysis , 2018, ICMI.

[6]  D. Gática-Pérez,et al.  A Nonverbal Behavior Approach to Identify Emergent Leaders in Small Groups , 2012, IEEE Transactions on Multimedia.

[7]  Vittorio Murino,et al.  Prediction of the Leadership Style of an Emergent Leader Using Audio and Visual Nonverbal Features , 2018, IEEE Transactions on Multimedia.

[8]  Jean-Marc Odobez,et al.  Multiperson Visual Focus of Attention from Head Pose and Meeting Contextual Cues , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Alexey Shvets,et al.  TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation , 2018, Computer-Aided Analysis of Gastrointestinal Videos.

[10]  John H. Bradley,et al.  The effect of personality type on team performance , 1997 .

[11]  Lie Lu,et al.  Music type classification by spectral contrast feature , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[12]  Lyudmila Sukhostat,et al.  A Comparative Analysis of Pitch Detection Methods Under the Influence of Different Noise Conditions. , 2015, Journal of voice : official journal of the Voice Foundation.

[13]  Daniel Gatica-Perez,et al.  An Audio Visual Corpus for Emergent Leader Analysis , 2011 .

[14]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[15]  Fei Wang,et al.  Tag Integrated Multi-Label Music Style Classification with Hypergraph , 2009, ISMIR.

[16]  Mike Brookes,et al.  PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Vittorio Murino,et al.  Multi-task learning of social psychology assessments and nonverbal features for automatic leadership identification , 2017, ICMI.

[18]  S. Srivastava,et al.  The Big Five Trait taxonomy: History, measurement, and theoretical perspectives. , 1999 .

[19]  Michael Frese,et al.  ACTION TRAINING FOR CHARISMATIC LEADERSHIP: TWO EVALUATIONS OF STUDIES OF A COMMERCIAL TRAINING MODULE ON INSPIRATIONAL COMMUNICATION OF A VISION , 2003 .

[20]  Vittorio Murino,et al.  Detecting emergent leader in a meeting environment using nonverbal visual features only , 2016, ICMI.

[21]  C.-C. Jay Kuo,et al.  Heuristic approach for generic audio data segmentation and annotation , 1999, MULTIMEDIA '99.

[22]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  S. Baron-Cohen,et al.  The "Reading the Mind in the Eyes" Test revised version: a study with normal adults, and adults with Asperger syndrome or high-functioning autism. , 2001, Journal of child psychology and psychiatry, and allied disciplines.

[24]  Joel A. Hesch,et al.  A Direct Least-Squares (DLS) method for PnP , 2011, 2011 International Conference on Computer Vision.

[25]  Christopher Gorse,et al.  MEETINGS: FACTORS THAT AFFECT GROUP INTERACTION AND PERFORMANCE , 2006 .

[26]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[27]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[28]  Nancy Chinchor,et al.  MUC-4 evaluation metrics , 1992, MUC.

[29]  Jean-Marc Odobez,et al.  Recognizing Visual Focus of Attention From Head Pose in Natural Meetings , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[30]  A. Pescosolido,et al.  Informal Leaders and the Development of Group Efficacy , 2001 .

[31]  Kazuhiro Otsuka,et al.  Estimating Visual Focus of Attention in Multiparty Meetings using Deep Convolutional Neural Networks , 2018, ICMI.

[32]  Marianne Schmid Mast,et al.  The Role of Nonverbal Behavior in Leadership: An Integrative Review , 2013 .

[33]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[34]  Joan Claudi Socoró,et al.  A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds , 2016 .

[35]  Junji Yamato,et al.  Linking speaking and looking behavior patterns with group composition, perception, and performance , 2012, ICMI '12.

[36]  Junji Yamato,et al.  A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances , 2005, ICMI '05.

[37]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[38]  C. Harte,et al.  Detecting harmonic change in musical audio , 2006, AMCMM '06.

[39]  G. Stewart,et al.  Composition, process, and performance in self-managed groups: the role of personality. , 1997, The Journal of applied psychology.

[40]  Michael Harris Bond,et al.  The depth of a group’s personality resources: Impacts on group process and group performance , 2004 .

[41]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[42]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[43]  Sridha Sridharan,et al.  Noise robust voice activity detection using features extracted from the time-domain autocorrelation function , 2010, INTERSPEECH.

[44]  Remus Ilies,et al.  Personality characteristics that are valued in teams: Not always “more is better”? , 2018, International journal of psychology : Journal international de psychologie.

[45]  Roseanne J. Foti,et al.  A test of leadership categorization theory: Internal structure, information processing, and leadership perceptions , 1984 .

[46]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[47]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[48]  Judith A. Hall,et al.  Nonverbal behavior and the vertical dimension of social relations: a meta-analysis. , 2005, Psychological bulletin.

[49]  Susanne Burger,et al.  The ISL meeting corpus: the impact of meeting type on speech style , 2002, INTERSPEECH.

[50]  H. Klein,et al.  Emergent Leadership in the Group Goal-Setting Process , 1995 .

[51]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[52]  Samy Bengio,et al.  Automatic analysis of multimodal group actions in meetings , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Heng Ji,et al.  The unobtrusive group interaction (UGI) corpus , 2019, MMSys.

[54]  Chris Thoman Model-based classification of speech audio , 2009 .

[55]  Hans-Werner Gellersen,et al.  Toward Mobile Eye-Based Human-Computer Interaction , 2010, IEEE Pervasive Computing.

[56]  O. John,et al.  Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German , 2007 .

[57]  Mark Hasegawa-Johnson,et al.  Detecting interaction links in a collaborating group using manually annotated data , 2012, Soc. Networks.

[58]  Nick Campbell,et al.  Multimedia Database of Meetings and Informal Interactions for Tracking Participant Involvement and Discourse Flow , 2006, LREC.

[59]  Christophe Garcia,et al.  Visual Focus of Attention Estimation With Unsupervised Incremental Learning , 2016, IEEE Trans. Circuits Syst. Video Technol..

[60]  Hiroshi Murase,et al.  Conversation Scene Analysis with Dynamic Bayesian Network Basedon Visual Head Tracking , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[61]  Vanessa Urch Druskat,et al.  Chapter 2 The impact of emergent leader's emotionally competent behavior on team trust, communication, engagement, and effectiveness , 2006 .

[62]  Hiroshi Sawada,et al.  Automatic inference of cross-modal nonverbal interactions in multiparty conversations: "who responds to whom, when, and how?" from gaze, head gestures, and utterances , 2007, ICMI '07.

[63]  Donald Chrusciel,et al.  Considerations of emotional intelligence (EI) in dealing with change decision management , 2006 .

[64]  Martin Remland Developing Leadership Skills in Nonverbal Communication: A Situational Perspective , 1981 .

[65]  Sumei Liang,et al.  Audio Content Classification Method Research Based on Two-step Strategy , 2014 .

[66]  Radu Horaud,et al.  Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.