The Workshop Programme Multimodal Corpora: Models Of Human Behaviour For The Specification And Evaluation Of Multimodal Input And Output Interfaces

We present our multimedia Visualization for Situated Temporal Analysis (VisSTA) system that facilitates analysis of multi-modal human communication incorporating video, audio, speech transcriptions, and coded multimodal (e.g. gesture and gaze) data. VisSTA is based on the Multiple Linked Representation strategy and keeps the user temporally situated by ensuring tight linkage among all representational components. The system features multiple representations, which include a hierarchical video-shot organization, a variety of animated graphs, animated multi-tier text transcripts, and an avatar representation. VisSTA is a multivideo system permitting simultaneous playing of multiple synchronized video streams that are time-locked to other data components. An integrated observation database system is included in VisSTA for storing the results of data analysis.

[1]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[2]  De Ruiter,et al.  Gesture and speech production , 1998 .

[3]  J. Bavelas,et al.  Visible Acts of Meaning , 2000 .

[4]  Herman D’Hooge Game Design Principles for the Intel ® Play TM Me 2 Cam * Virtual Game System , 2001 .

[5]  S. Levinson Presumptive Meanings: The theory of generalized conversational implicature , 2001 .

[6]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[7]  J. Piet Verckens Nonverbal Communication across Disciplines , 2003 .

[8]  Björn Granström,et al.  Measurements of articulatory variation and communicative signals in expressive speech , 2003, AVSP.

[9]  Herbert H. Clark,et al.  Contributing to Discourse , 1989, Cogn. Sci..

[10]  Mark Steedman,et al.  Generating Facial Expressions for Speech , 1996, Cogn. Sci..

[11]  Matthew Stone,et al.  Making discourse visible: coding and animating conversational facial displays , 2002, Proceedings of Computer Animation 2002 (CA 2002).

[12]  Kåre Sjölander,et al.  An HMM-based system for automatic segmentation and alignment of speech , 2003 .

[13]  J. Beskow Talking Heads - Models and Applications for Multimodal Speech Synthesis , 2003 .

[14]  Piero Cosi,et al.  Statistical Definition of Visual Information for Italian Vowels and Consonants , 1998, AVSP.

[15]  Catherine Pelachaud,et al.  Multimodal Communication and Context in Embodied Agents. Proceedings of the Workshop W7 at the 5th International Conference on Autonomous Agents , 2001 .

[16]  Shyamsundar Rajaram,et al.  Design of a digital library for human movement , 2001, JCDL '01.

[17]  Michael Kipp,et al.  From Human Gesture to Synthetic Action , 2001 .

[18]  J. Allwood,et al.  A study of gestural feedback expressions , 2006 .

[19]  Rudolf von Laban,et al.  Effort: economy in body movement , 1974 .

[20]  Fabio Pianesi,et al.  Preliminary Cross-Cultural Evaluation of Expressiveness in Synthetic Faces , 2004, ADS.

[21]  Björn Granström,et al.  Resynthesis of Facial and Intraoral Articulation fromSimultaneous Measurements , 2003 .

[22]  Michael Kipp,et al.  ANVIL - a generic annotation tool for multimodal dialogue , 2001, INTERSPEECH.

[23]  Jens Piesk,et al.  AMOBA: a database system for annotating captured human movements , 2002, Proceedings of Computer Animation 2002 (CA 2002).

[24]  Sharon L. Oviatt,et al.  Ten myths of multimodal interaction , 1999, Commun. ACM.

[25]  Arne Jönsson,et al.  Wizard of Oz studies: why and how , 1993, IUI '93.

[26]  Jan Peter,et al.  A quantitative model of Störung , 2003 .

[27]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[28]  Niels Ole Bernsen,et al.  Designing interactive speech systems - from first ideas to user testing , 1998 .

[29]  P. Ekman Emotion in the human face , 1982 .

[30]  S. Brennan Centering Attention in Discourse. , 1995 .

[31]  Perttu Hämäläinen,et al.  A Computer Vision and Hearing Based User Interface for a Computer Game for Children , 2002, User Interfaces for All.

[32]  L Jean-ClaudeMARTIN On the Annotation of Multimodal Behavior and Computation of Cooperation Between Modalities , 2000 .

[33]  Marilyn A. Walker,et al.  Towards developing general models of usability with PARADISE , 2000, Natural Language Engineering.

[34]  Jens Edlund,et al.  Turn-taking gestures and hour-glasses in a multi-modal dialogue system , 2002 .

[35]  Loredana Cerrato,et al.  A comparison between feedback strategies in human-to-human and human-machine communication , 2002, INTERSPEECH.

[36]  W. Stokoe Sign Language Structure , 1980 .

[37]  Isabella Poggi The Lexicon and the Alphabet of Gesture, Gaze, and Touch , 2001, IVA.

[38]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[39]  Jens Allwood Dialog Coding - Function and Grammar: Göteborg Coding Schemsas , 2001 .

[40]  R. Krauss,et al.  Verbal, vocal, and visible factors in judgments of another's affect. , 1981 .

[41]  Louis Vuurpijl,et al.  SLOT: A research platform for investigating multimodal communication , 2003, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[42]  Rudolf von Laban,et al.  Laban's principles of dance and movement notation , 1975 .

[43]  Gwyneth Doherty-Sneddon,et al.  The Reliability of a Dialogue Structure Coding Scheme , 1997, CL.

[44]  Loredana Cerrato,et al.  A method for the analysis and measurement of communicative head movements in human dialogues , 2003, AVSP.

[45]  Björn Granström,et al.  Speech and Gestures for Talking Faces in Conversational Dialogue Systems , 2002 .