A gaze-based method for relating group involvement to individual engagement in multimodal multiparty dialogue

This paper is concerned with modelling individual engagement and group involvement as well as their relationship in an eight-party, mutimodal corpus. We propose a number of features (presence, entropy, symmetry and maxgaze) that summarise different aspects of eye-gaze patterns and allow us to describe individual as well as group behaviour in time. We use these features to define similarities between the subjects and we compare this information with the engagement rankings the subjects expressed at the end of each interactions about themselves and the other participants. We analyse how these features relate to four classes of group involvement and we build a classifier that is able to distinguish between those classes with 71\% of accuracy.

[1]  Yukiko I. Nakano,et al.  An empirical study of eye-gaze behaviors: towards the estimation of conversational engagement in human-agent communication , 2010, EGIHMI '10.

[2]  Nick Campbell,et al.  On the Use of Multimodal Cues for the Prediction of Degrees of Involvement in Spontaneous Conversation , 2011, INTERSPEECH.

[3]  Samy Bengio,et al.  Automatic analysis of multimodal group actions in meetings , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Gabriel Skantze,et al.  Attention and Interaction Control in a Human-Human-Computer Dialogue Setting , 2009, SIGDIAL Conference.

[5]  Samy Bengio,et al.  Detecting group interest-level in meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Nick Campbell,et al.  How Do We React to Context? Annotation of Individual and Group Engagement in a Video Corpus , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[7]  Elizabeth Shriberg,et al.  Spotting "hot spots" in meetings: human judgments and prosodic cues , 2003, INTERSPEECH.

[8]  Mattias Heldner,et al.  The KTH Games Corpora : How to Catch a Werewolf , 2013 .

[9]  Julia Hirschberg,et al.  Detecting Levels of Interest from Spoken Dialog with Multistream Prediction Feedback and Similarity Based Hierarchical Fusion Learning , 2011, SIGDIAL Conference.

[10]  Roman Bednarik,et al.  Gaze and conversational engagement in multiparty video conversation: an annotation scheme and classification of high and low levels of engagement , 2012, Gaze-In '12.

[11]  Kristiina Jokinen,et al.  Visual interaction and conversational activity , 2012, Gaze-In '12.

[12]  Petra Wagner,et al.  D64: a corpus of richly recorded conversational interaction , 2013, Journal on Multimodal User Interfaces.

[13]  Alessandro Vinciarelli,et al.  Automatic Role Recognition in Multiparty Recordings: Using Social Affiliation Networks for Feature Extraction , 2009, IEEE Transactions on Multimedia.

[14]  Elizabeth Shriberg,et al.  Relationship between dialogue acts and hot spots in meetings , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[15]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[16]  Flemming Topsøe,et al.  Jensen-Shannon divergence and Hilbert space embedding , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[17]  Eric Horvitz,et al.  Models for Multiparty Engagement in Open-World Dialog , 2009, SIGDIAL Conference.

[18]  Petra Wagner,et al.  Towards the Automatic Detection of Involvement in Conversation , 2010, COST 2102 Conference.