Towards the Automatic Detection of Involvement in Conversation

Although an increasing amount of research has been carried out into human-machine interaction in the last century, even today we are not able to fully understand the dynamic changes in human interaction. Only when we achieve this, will we be able to go beyond a one-to-one mapping between text and speech and be able to add social information to speech technologies. Social information is expressed to a high degree through prosodic cues and movement of the body and the face. The aim of this paper is to use those cues to make one aspect of social information more tangible; namely participants' degree of involvement in a conversation. Our results for voice span and intensity, and our preliminary results on the movement of the body and face suggest that these cues are reliable cues for the detection of distinct levels of participants involvement in conversation. This will allow for the development of a statistical model which is able to classify these stages of involvement. Our data indicate that involvement may be a scalar phenomenon.

[1]  John H. Antil Conceptualization and Operationalization of Involvement , 1984 .

[2]  Joakim Gustafson,et al.  Prosodic cues to engagement in non-lexical response tokens in Swedish , 2010, DiSS-LPSS.

[3]  D. Gática-Pérez Modeling interest in face-to-face conversations from multimodal nonverbal behavior , 2009 .

[4]  Allison Woodruff,et al.  Detecting user engagement in everyday conversations , 2004, INTERSPEECH.

[5]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[6]  C. Looze,et al.  INTEGRATING CHANGES OF REGISTER INTO AUTOMATIC INTONATION ANALYSIS , 2010 .

[7]  Petra Wagner,et al.  On automatic prominence detection for German , 2007, INTERSPEECH.

[8]  Elizabeth Shriberg,et al.  Spotting "hot spots" in meetings: human judgments and prosodic cues , 2003, INTERSPEECH.

[9]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[10]  David Crystal,et al.  Investigating English Style , 1969 .

[11]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[12]  Günther Palm,et al.  Multimodal Laughter Detection in Natural Discourses , 2009, Human Centered Robot Systems, Cognition, Interaction, Technology.

[13]  M. Selting,et al.  Emphatic speech style mdash; with special focus on the prosodic signalling of heightened emotive involvement in conversation , 1994 .

[14]  Roberto Dillon,et al.  A Possible Model for Predicting Listeners' Emotional Engagement , 2005, CMMR.

[15]  D. Gática-Pérez Modelling Interest in Face-to-Face Conversations from Multimodal Nonverbal Behaviour , 2010 .

[16]  Petra Wagner,et al.  D64: a corpus of richly recorded conversational interaction , 2013, Journal on Multimodal User Interfaces.

[17]  Gina-Anne Levow,et al.  Multi-modal Analysis of Interactional Rapport in Three Language/Cultural Groups , 2010 .