On the Use of Multimodal Cues for the Prediction of Degrees of Involvement in Spontaneous Conversation

Quantifying the degree of involvement of a group of participants in a conversation is a task which humans accomplish every day, but it is something that, as of yet, machines are unable to do. In this study we first investigate the correlation between visual cues (gaze and blinking rate) and involvement. We then test the suitability of prosodic cues (acoustic model) as well as gaze and blinking (visual model) for the prediction of the degree of involvement by using a support vector machine (SVM). We also test whether the fusion of the acoustic and the visual model improves the prediction. We show that we are able to predict three classes of involvement with an reduction of error rate of 0.30 (accuracy =0.68).

[1]  Kristin P. Bennett,et al.  Support vector machines: hype or hallelujah? , 2000, SKDD.

[2]  Jean Carletta,et al.  Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus , 2007, Lang. Resour. Evaluation.

[3]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[4]  Friedhelm Schwenker,et al.  Fuzzy-Input Fuzzy-Output One-Against-All Support Vector Machines , 2007, KES.

[5]  Petra Wagner,et al.  Towards the Automatic Detection of Involvement in Conversation , 2010, COST 2102 Conference.

[6]  Elizabeth Shriberg,et al.  Spotting "hot spots" in meetings: human judgments and prosodic cues , 2003, INTERSPEECH.

[7]  A. Kendon Some functions of gaze-direction in social interaction. , 1967, Acta psychologica.

[8]  Friedhelm Schwenker,et al.  Multiple Classifier Systems for the Recogonition of Human Emotions , 2010, MCS.

[9]  Petra Wagner,et al.  D64: a corpus of richly recorded conversational interaction , 2013, Journal on Multimodal User Interfaces.

[10]  Michael Argyle,et al.  The central Europe experiment: Looking at persons and looking at objects , 1976 .

[11]  R. Espesser,et al.  Le CID - Corpus of Interactional Data. Annotation et exploitation multimodale de parole conversationnelle [The “Corpus of Interactional Data” (CID) - Multimodal annotation of conversational speech”] , 2008, ICON.

[12]  Fred Cummins Gaze and blinking in dyadic conversation: A study in coordinated behaviour among individuals , 2012 .

[13]  Anton Nijholt,et al.  Eye gaze patterns in conversations: there is more to conversational agents than meets the eyes , 2001, CHI.

[14]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[15]  Allison Woodruff,et al.  Detecting user engagement in everyday conversations , 2004, INTERSPEECH.