Evaluating a minimally invasive laboratory architecture for recording multimodal conversational data

This paper presents ongoing work on the design, deployment and evaluation of a multimodal data acquisition architecture which utilises minimally invasive motion, head, eye and gaze tracking alongside high-quality audiovisual recording of human interactions. The different data streams are centrally collected and visualised at a single point and in real time by means of integration in a virtual reality (VR) environment. The overall aim of this endeavour is the implementation of a multimodal data acquisition facility for the purpose of studying non-verbal phenomena such as feedback gestures, hand and pointing gestures and multi-modal alignment. In the first part of this work that is described here, a series of tests were performed in order to evaluate the feasibility of tracking feedback head gestures using the proposed architecture.

[1]  Costanza Navarretta,et al.  The NOMCO Multimodal Nordic Resource - Goals and Characteristics , 2010, LREC.

[2]  Thies Pfeiffer,et al.  Sprach-Gestik Experimente mit IADE, dem Interactive Augmented Data Explorer , 2006 .

[3]  Kristiina Jokinen,et al.  Gaze and Gesture Activity in Communication , 2009, HCI.

[4]  Nick Campbell,et al.  An audio-visual approach to measuring discourse synchrony in multimodal conversation data , 2009, INTERSPEECH.

[5]  Ipke Wachsmuth,et al.  Deictic object reference in task-oriented dialogue , 2006 .

[6]  Jens Allwood,et al.  Repeated head movements, their function and relation to speech , 2010 .

[7]  Thies Pfeiffer Using virtual reality technology in linguistic research , 2012, 2012 IEEE Virtual Reality Workshops (VRW).

[8]  Jonathan W. Decker,et al.  Performance measurements for the Microsoft Kinect skeleton , 2012, 2012 IEEE Virtual Reality Workshops (VRW).

[9]  Hennie Brugman,et al.  Annotating Multi-media/Multi-modal Resources with ELAN , 2004, LREC.

[10]  Matej Rojc,et al.  Towards ECA's Animation of Expressive Complex Behaviour , 2010, COST 2102 Conference.

[11]  Daniel C. Richardson,et al.  Synchrony and swing in conversation: coordination, temporal dynamics and communication , 2008 .

[12]  Nick Campbell,et al.  DATABASES OF EMOTIONAL SPEECH , 2000 .

[13]  Masafumi Nishida,et al.  Eye-gaze experiments for conversation monitoring , 2009, IUCS.

[14]  Thies Pfeiffer,et al.  Understanding multimodal deixis with gaze and gesture in conversational interfaces , 2011 .

[15]  Constantine Stephanidis Intelligent and ubiquitous interaction environments , 2009 .

[16]  Zofia Malisz,et al.  Listener head gestures and verbal feedback expressions in a distraction task , 2012, Interspeech 2012.

[17]  Dawn Knight,et al.  The future of multimodal corpora , 2011 .