Inter-labeler Agreement for Anger Detection in Interactive Voice Response Systems

Anger detection in speech-based automated telephone applications is a growing field of research. In this work we report on inter-labeler agreement in a “real-life” anger detection task for Interactive Voice Response (IVR) systems. The presented study is based on a corpus of 1.911 calls containing 22.711 utterances and describes considerations prior to the rating process. We point out difficulties we faced when annotating the corpus and present statistics and agreement values obtained after rating. The 3 raters that were asked to annotate angry user utterances agreed on the nature of “non-angry” utterances, but had difficulties to find an agreement on how an angry user utterance should sound.