Using Hotspots as a Novel Method for Accessing Key Events in a Large Multi-Modal Corpus

In 2009 we created the D64 corpus, a multi-modal corpus which consists of roughly eight hours of natural, non-directed spontaneous interaction in an informal setting. Five participants feature in the recordings and their conversations were captured by microphones (room, body mounted and head mounted), video cameras and a motion capture system. The large amount of video, audio and motion capture material made it necessary to structure and make available the corpus in such a way that it is easy to browse and query for various types of data that we term primary, secondary and tertiary. While users are able to make simple and highly structured searches, we discuss the use of conversational hotspots as a method of searching the data for key events in the corpus; thus enabling a user to obtain a broad overview of the data. In this paper we present an approach to structuring and presenting a multi-modal corpus based on our experience with the D64 corpus that is accessible over the web, incorporates an interactive front-end and is open to all interested researchers and students.

[1]  Hannes Rieser,et al.  On Factoring Out a Gesture Typology from the Bielefeld Speech-and-Gesture-Alignment Corpus (SAGA) , 2009, Gesture Workshop.

[2]  Noriko Tomuro,et al.  Djangology: A Light-weight Web-based Tool for Distributed Collaborative Text Annotation , 2010, LREC.

[3]  Petra Wagner,et al.  D64: a corpus of richly recorded conversational interaction , 2013, Journal on Multimodal User Interfaces.

[4]  Sarah Jane Delany,et al.  Using Crowdsourcing for Labelling Emotional Speech Assets , 2010 .

[5]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[7]  Christoph Draxler,et al.  WebTranscribe - An Extensible Web-Based Speech Annotation Framework , 2005, TSD.

[8]  Nagiza F. Samatova,et al.  WebBANC: Building Semantically-Rich Annotated Corpora from Web User Annotations of Minority Languages , 2009, NODALIDA.

[9]  Michael Kipp,et al.  ANVIL - a generic annotation tool for multimodal dialogue , 2001, INTERSPEECH.

[10]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[11]  Jean Carletta,et al.  Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus , 2007, Lang. Resour. Evaluation.

[12]  Scot Hacker,et al.  MP3: The Definitive Guide , 2000 .

[13]  Cyrus Rashtchian,et al.  Collecting Image Annotations Using Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[14]  Carina Silberer,et al.  Proceedings of the International Conference on Language Resources and Evaluation (LREC) , 2008 .

[15]  Charlie Cullen,et al.  CorpVis: An Online Emotional Speech Corpora Visualisation Interface , 2009, SAMT.

[16]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[17]  Elizabeth Shriberg,et al.  Spotting "hot spots" in meetings: human judgments and prosodic cues , 2003, INTERSPEECH.