D64: a corpus of richly recorded conversational interaction

In recent years there has been a substantial debate about the need for increasingly spontaneous, conversational corpora of spoken interaction that are not controlled or task directed. In parallel the need has arisen for the recording of multi-modal corpora which are not restricted to the audio domain alone. With a corpus that would fulfill both needs, it would be possible to investigate the natural coupling, not only in turn-taking and voice, but also in the movement of participants. In the following paper we describe the design and recording of such a corpus and we provide some illustrative examples of how such a corpus might be exploited in the study of dynamic interaction. The D64 corpus is a multimodal corpus recorded over two successive days. Each day resulted in approximately 4 h of recordings. In total five participants took part in the recordings of whom two participants were female and three were male. Seven video cameras were used of which at least one was trained on each participant. The Optitrack motion capture kit was used in order to enrich information. The D64 corpus comprises annotations on conversational involvement, speech activity and pauses as well as information of the average degree of change in the movement of participants.

[1]  J. Trouvain,et al.  The Prosody of Excitement in Horse Race Commentaries , 2000 .

[2]  Elizabeth Shriberg,et al.  Spotting "hot spots" in meetings: human judgments and prosodic cues , 2003, INTERSPEECH.

[3]  Jean Carletta,et al.  Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus , 2007, Lang. Resour. Evaluation.

[4]  Nick Campbell,et al.  Measuring Dynamics of Mimicry by Means of Prosodic Cues in Conversational Speech , 2011, ICPhS.

[5]  Nadia Mana,et al.  Multimodal corpus of multi-party meetings for automatic social behavior analysis and personality traits detection , 2007, TMR '07.

[6]  Jens Edlund,et al.  Spontal: A Swedish Spontaneous Dialogue Corpus of Audio, Video and Motion Capture , 2010, LREC.

[7]  Anton Nijholt,et al.  A Multimodal Database for Mimicry Analysis , 2011, ACII.

[8]  Petra Wagner,et al.  Towards the Automatic Detection of Involvement in Conversation , 2010, COST 2102 Conference.

[9]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[10]  D. Traum,et al.  The UTEP-ICT Cross-Cultural Multiparty Multimodal Dialog Corpus , 2010 .

[11]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[12]  D. Massaro Multimodal Speech Perception: A Paradigm for Speech Science , 2002 .

[13]  Daniel C. Richardson,et al.  The Art of Conversation Is Coordination , 2007, Psychological science.

[14]  Gabriel Skantze,et al.  Furhat: A Back-Projected Human-Like Robot Head for Multiparty Human-Machine Interaction , 2011, COST 2102 Training School.

[15]  Maja Pantic,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING , 2022 .

[16]  Kostas Karpouzis,et al.  The HUMAINE Database: Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data , 2007, ACII.

[17]  Nick Campbell,et al.  A Software Toolkit for Viewing Annotated Multimodal Data Interactively over the Web , 2010, LREC.

[18]  E. D. Paolo,et al.  Participatory sensemaking An enactive approach to social cognition , 2007 .

[19]  E. D. Paolo,et al.  Participatory sense-making , 2007 .

[20]  K. Chang,et al.  Embodiment in conversational interfaces: Rea , 1999, CHI '99.

[21]  Daniel C. Richardson,et al.  Conversation and Coordinative Structures , 2009, Top. Cogn. Sci..

[22]  Mary P. Harper,et al.  VACE Multimodal Meeting Corpus , 2005, MLMI.

[23]  Nick Campbell,et al.  Tools & Resources for Visualising Conversational-Speech Interaction , 2008, LREC.

[24]  Eric Sanders,et al.  The IFADV Corpus: a Free Dialog Video Corpus , 2008, LREC.

[25]  C. Moore,et al.  Joint attention : its origins and role in development , 1995 .

[26]  David Crystal,et al.  Investigating English Style , 1969 .

[27]  Friedhelm Schwenker,et al.  Multiple Classifier Systems for the Recogonition of Human Emotions , 2010, MCS.

[28]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[29]  W. Labov Some Further Steps in Narrative Analysis , 1997 .

[30]  Matthias Rehm,et al.  Creating a Standardized Corpus of Multimodal Interactions for Enculturating Conversational Interfaces , 2007 .

[31]  R. Espesser,et al.  Le CID - Corpus of Interactional Data. Annotation et exploitation multimodale de parole conversationnelle [The “Corpus of Interactional Data” (CID) - Multimodal annotation of conversational speech”] , 2008, ICON.

[32]  C. Trevarthen,et al.  The infant's role in mother–infant communications , 1986, Journal of Child Language.

[33]  Jindrich Matousek,et al.  Design, implementation and evaluation of the Czech realistic audio-visual speech synthesis , 2006, Signal Process..

[34]  Nick Campbell,et al.  On the Use of Multimodal Cues for the Prediction of Degrees of Involvement in Spontaneous Conversation , 2011, INTERSPEECH.

[35]  J. Burgoon,et al.  Interpersonal Adaptation: Dyadic Interaction Patterns , 1995 .

[36]  Hannes Rieser,et al.  On Factoring Out a Gesture Typology from the Bielefeld Speech-and-Gesture-Alignment Corpus (SAGA) , 2009, Gesture Workshop.

[37]  Laura K. Guerrero,et al.  Nonverbal Involvement Across Interactions with Same-Sex Friends, Opposite-Sex Friends and Romantic Partners: Consistency or Change? , 1997 .

[38]  Uwe Altmann,et al.  Analysis of Nonverbal Involvement in Dyadic Interactions , 2007, COST 2102 Workshop.

[39]  A. Mehrabian Some referents and measures of nonverbal behavior , 1968 .

[40]  Yi Xu,et al.  In defense of lab speech , 2010, J. Phonetics.

[41]  Judee K. Burgoon,et al.  The Nature of Conversational Involvement and Nonverbal Encoding Patterns , 1987 .

[42]  Dare A. Baldwin,et al.  Understanding the link between joint attention and language. , 1995 .

[43]  Kristin P. Bennett,et al.  Support vector machines: hype or hallelujah? , 2000, SKDD.

[44]  Fred Cummins Gaze and blinking in dyadic conversation: A study in coordinated behaviour among individuals , 2012 .

[45]  Mattias Heldner,et al.  Towards human-like spoken dialogue systems , 2008, Speech Commun..

[46]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .