Automatic social interaction analysis with audio and visual nonverbal cues

In this paper we present a review of human social interaction analysis based on audio and visual nonverbal cues. Furthermore, as an example study, we present our study on automatic dominance estimation in small group conversations. We extracted low level audio and visual features, defined in parallel to the nonverbal cues displayed by dominant people, as stated in social psychology literature. We show that, using simple features and simple classifiers, we are able to achieve performances around 85–90% in estimating the most/least dominant person. We also show that audio features alone give high accuracies whereas visual features are necessary for more accurate results for the estimation of dominance.

[1]  Jean-Marc Odobez,et al.  Predicting two facets of social verticality in meetings from five-minute time slices and nonverbal cues , 2008, ICMI '08.

[2]  M. Knapp,et al.  Nonverbal communication in human interaction , 1972 .

[3]  Alessandro Vinciarelli,et al.  Automatic Role Recognition in Multiparty Recordings: Using Social Affiliation Networks for Feature Extraction , 2009, IEEE Transactions on Multimedia.

[4]  Daniel Gatica-Perez,et al.  Automatic nonverbal analysis of social interaction in small groups: A review , 2009, Image Vis. Comput..

[5]  Dirk Heylen,et al.  Dominance Detection in Meetings Using Easily Obtainable Features , 2005, MLMI.

[6]  Alex Pentland,et al.  Socially aware, computation and communication , 2005, Computer.

[7]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[8]  N. Ambady,et al.  Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. , 1992 .

[9]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[10]  Nadia Mana,et al.  Modeling the Personality of Participants During Group Interactions , 2009, UMAP.

[11]  Alex Pentland,et al.  Sensible Organizations: Technology and Methodology for Automatically Measuring Organizational Behavior , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Chuohao Yeo,et al.  Modeling Dominance in Group Conversations Using Nonverbal Activity Cues , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Nicu Sebe,et al.  Guest Editors' Introduction: Human-Centered Computing--Toward a Human Revolution , 2007, Computer.

[14]  Masakiyo Fujimoto,et al.  Realtime meeting analysis and 3D meeting viewer based on omnidirectional multimodal sensors , 2009, ICMI-MLMI '09.

[15]  S. Shankar Sastry,et al.  High-Speed Action Recognition and Localization in Compressed Domain Videos , 2008, IEEE Transactions on Circuits and Systems for Video Technology.