MINT.tools: tools and adaptors supporting acquisition, annotation and analysis of multimodal corpora

This paper presents a collection of tools (and adaptors for ex- isting tools) that we have recently developed, which support ac- quisition, annotation and analysis of multimodal corpora. For acquisition, an extensible architecture is offered that integrates various sensors, based on existing connectors (e.g. for motion capturing via VICON, or ART) and on connectors we contribute (for motion tracking via Microsoft Kinect as well as eye track- ing via Seeingmachines FaceLAB 5). The architecture provides live visualisation of the multimodal data in a unified virtual real- ity (VR) view (using Fraunhofer Instant Reality) for control dur- ing recordings, and enables recording of synchronised streams. For annotation, we provide a connection between the annotation tool ELAN (MPI Nijmegen) and the VR visualisation. For anal- ysis, we provide routines in the programming language Python that read in and manipulate (aggregate, transform, plot, anal- yse) the sensor data, as well as text annotation formats (Praat TextGrids). Use of this toolset in multimodal studies proved to be efficient and effective, as we discuss. We make the collection available as open source for use by other researchers.

[1]  Hedda Lausberg,et al.  Methods in Gesture Research: , 2009 .

[2]  Philipp Cimiano,et al.  Towards an ontology of categories for multimodal annotation , 2012, LREC 2012.

[3]  Charlie Cullen,et al.  Emotional Speech Corpus Construction, Annotation and Distribution , 2008, LREC 2008.

[4]  Henry A. Kautz,et al.  A multimodal corpus for integrated language and action , 2012 .

[5]  Masafumi Nishida,et al.  Eye-gaze experiments for conversation monitoring , 2009, IUCS.

[6]  Nick Campbell,et al.  Tools & Resources for Visualising Conversational-Speech Interaction , 2008, LREC.

[7]  Philippe Blache,et al.  Creating and Exploiting Multimodal Annotated Corpora: The ToMA Project , 2009, Multimodal Corpora.

[8]  Luc Van Gool,et al.  3D vision technology for capturing multimodal corpora: chances and challenges , 2010 .

[9]  Petra Wagner,et al.  D64: a corpus of richly recorded conversational interaction , 2013, Journal on Multimodal User Interfaces.

[10]  Toyoaki Nishida,et al.  Analysis environment of conversational structure with nonverbal multimodal data , 2010, ICMI-MLMI '10.

[11]  Petra Wagner,et al.  Evaluating a minimally invasive laboratory architecture for recording multimodal conversational data , 2012, Interspeech 2012.

[12]  Thies Pfeiffer,et al.  Sprach-Gestik Experimente mit IADE, dem Interactive Augmented Data Explorer , 2006 .

[13]  Taras Butko,et al.  A multilingual corpus for rich audio-visual scene description in a meeting-room environment , 2011 .

[14]  Paul Tennent,et al.  Developing heterogeneous corpora using the Digital Replay System (DRS). , 2010 .

[15]  Jens Edlund,et al.  Spontal: A Swedish Spontaneous Dialogue Corpus of Audio, Video and Motion Capture , 2010, LREC.

[16]  Jean-Claude Martin,et al.  A Multimodal Corpus Approach for the Study of Spontaneous Emotions , 2009, Affective Information Processing.

[17]  Dawn Knight,et al.  The future of multimodal corpora , 2011 .

[18]  Thies Pfeiffer Using virtual reality technology in linguistic research , 2012, 2012 IEEE Virtual Reality Workshops (VRW).

[19]  Nikos Fakotakis,et al.  PROMETHEUS database: A multimodal corpus for research on modeling and interpreting human behavior , 2009, 2009 16th International Conference on Digital Signal Processing.

[20]  Thies Pfeiffer,et al.  Understanding multimodal deixis with gaze and gesture in conversational interfaces , 2011 .

[21]  Ipke Wachsmuth,et al.  Deictic object reference in task-oriented dialogue , 2006 .

[22]  Michael Kipp Spatiotemporal Coding in ANVIL , 2008, LREC.

[23]  Petra Wagner,et al.  Exploring annotation of head gesture forms in spontaneous human interaction. , 2013 .