Overlapping speech, utterance duration and affective content in HHI and HCI - An comparison

In human conversation, turn-taking is a critical issue. Especially if only the speech channel is available (e.g. telephone), correct timing as well as affective and verbal signals are required. In cases of failure, overlapping speech may occur which is in the focus of this paper. We investigate the davero corpus a large naturalistic spoken corpus of real call center telephone conversations and compare our findings to results on the well-known SmartKom corpus consisting of human-computer interaction. We first show that overlapping speech occurs in different types of situational settings - extending the well-known categories cooperative and competitive overlaps -, all of which are frequent enough to be analyzed. Furthermore, we present connections between the occurrence of overlapping speech and the length of the previous utterance, and show that overlapping speech occurs at dialog instances where certain affective states are changing. Our results allow the prediction of forthcoming threat of overlapping speech, and hence preventive measures, especially in professional environments like call-centers with human or automatic agents.

[1]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[2]  Guy J. Brown,et al.  Resources for turn competition in overlapping talk , 2013, Speech Commun..

[3]  Wolfgang Wahlster,et al.  SmartKom: Foundations of Multimodal Dialogue Systems , 2006, SmartKom.

[4]  Mattias Heldner,et al.  Pauses, gaps and overlaps in conversations , 2010, J. Phonetics.

[5]  Florian Schiel,et al.  Development of the UserState Conventions for the Multimodal Corpus in SmartKom , 2002 .

[6]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 2015 .

[7]  E. Schegloff Discourse as an interactional achievement : Some uses of "Uh huh" and other things that come between sentences , 1982 .

[8]  P. Baranyi,et al.  Definition and synergies of cognitive infocommunications , 2012 .

[9]  Ingo Siegert,et al.  ikannotate - A Tool for Labelling, Transcription, and Annotation of Emotionally Coloured Speech , 2011, ACII.

[10]  E. Schegloff Overlapping talk and the organization of turn-taking for conversation , 2000, Language in Society.

[11]  Tanya Stivers,et al.  Research on Language & Social Interaction , 2011 .

[12]  Ingo Siegert,et al.  Discourse Particles and User Characteristics in Naturalistic Human-Computer Interaction , 2014, HCI.

[13]  Florian Schiel,et al.  The SmartKom Multimodal Corpus at BAS , 2002, LREC.

[14]  Andreas Wendemuth,et al.  Companion-Technology for Cognitive Technical Systems , 2011, KI - Künstliche Intelligenz.

[15]  C. Pelachaud,et al.  Generating Listening Behaviour , 2011 .

[16]  Björn W. Schuller,et al.  Acoustic emotion recognition: A benchmark comparison of performances , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[17]  Ingo Siegert,et al.  Inter-rater reliability for emotion annotation in human–computer interaction: comparison and methodological improvements , 2013, Journal on Multimodal User Interfaces.

[18]  Junji Yamato,et al.  Analysis of Respiration for Prediction of "Who Will Be Next Speaker and When?" in Multi-Party Meetings , 2014, ICMI.

[19]  Kim Hartmann,et al.  Investigating the Form-Function-Relation of the Discourse Particle "hm" in a Naturalistic Human-Computer Interaction , 2013, WIRN.

[20]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  K. Scherer What are emotions? And how can they be measured? , 2005 .

[22]  Ingo Siegert,et al.  Analysis of significant dialog events in realistic human–computer interaction , 2013, Journal on Multimodal User Interfaces.

[23]  Friedemann Schulz von Thun,et al.  Miteinander reden / 1 Störungen und Klärungen : Psychologie der zwischenmenschlichen Kommunikation , 1983 .

[24]  Peter French,et al.  Turn-competitive incomings , 1983 .

[25]  Peter Baranyi,et al.  Cognitive infocommunications: CogInfoCom , 2015, 2010 11th International Symposium on Computational Intelligence and Informatics (CINTI).

[26]  Elisabetta Bevacqua,et al.  A Model of Attention and Interest Using Gaze Behavior , 2005, IVA.

[27]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[28]  Guy J. Brown,et al.  Fundamental Frequency Height as a Resource for the Management of Overlap in Talk-in-Interaction , 2009 .

[29]  Mathias Theunis,et al.  The Good, the Bad and the Neutral: Affective Profile in Dialog System-User Communication , 2011, ACII.