论文信息 - Evaluating a minimally invasive laboratory architecture for recording multimodal conversational data

Evaluating a minimally invasive laboratory architecture for recording multimodal conversational data

This paper presents ongoing work on the design, deployment and evaluation of a multimodal data acquisition architecture which utilises minimally invasive motion, head, eye and gaze tracking alongside high-quality audiovisual recording of human interactions. The different data streams are centrally collected and visualised at a single point and in real time by means of integration in a virtual reality (VR) environment. The overall aim of this endeavour is the implementation of a multimodal data acquisition facility for the purpose of studying non-verbal phenomena such as feedback gestures, hand and pointing gestures and multi-modal alignment. In the first part of this work that is described here, a series of tests were performed in order to evaluate the feasibility of tracking feedback head gestures using the proposed architecture.

[1] Costanza Navarretta,et al. The NOMCO Multimodal Nordic Resource - Goals and Characteristics , 2010, LREC.

[2] Thies Pfeiffer,et al. Sprach-Gestik Experimente mit IADE, dem Interactive Augmented Data Explorer , 2006 .

[3] Kristiina Jokinen,et al. Gaze and Gesture Activity in Communication , 2009, HCI.

[4] Nick Campbell,et al. An audio-visual approach to measuring discourse synchrony in multimodal conversation data , 2009, INTERSPEECH.

[5] Ipke Wachsmuth,et al. Deictic object reference in task-oriented dialogue , 2006 .

[6] Jens Allwood,et al. Repeated head movements, their function and relation to speech , 2010 .

[7] Thies Pfeiffer. Using virtual reality technology in linguistic research , 2012, 2012 IEEE Virtual Reality Workshops (VRW).

[8] Jonathan W. Decker,et al. Performance measurements for the Microsoft Kinect skeleton , 2012, 2012 IEEE Virtual Reality Workshops (VRW).

[9] Hennie Brugman,et al. Annotating Multi-media/Multi-modal Resources with ELAN , 2004, LREC.

[10] Matej Rojc,et al. Towards ECA's Animation of Expressive Complex Behaviour , 2010, COST 2102 Conference.

[11] Daniel C. Richardson,et al. Synchrony and swing in conversation: coordination, temporal dynamics and communication , 2008 .

[12] Nick Campbell,et al. DATABASES OF EMOTIONAL SPEECH , 2000 .

[13] Masafumi Nishida,et al. Eye-gaze experiments for conversation monitoring , 2009, IUCS.

[14] Thies Pfeiffer,et al. Understanding multimodal deixis with gaze and gesture in conversational interfaces , 2011 .

[15] Constantine Stephanidis. Intelligent and ubiquitous interaction environments , 2009 .

[16] Zofia Malisz,et al. Listener head gestures and verbal feedback expressions in a distraction task , 2012, Interspeech 2012.

[17] Dawn Knight,et al. The future of multimodal corpora , 2011 .