MacVisSTA: A System for Multimodal Analysis of Human Communication and Interaction

The study of embodied communication requires access to multiple data sources such as multistream video and audio, various derived and meta-data such as gesture, head, posture, facial expression and gaze information. This thesis presents the data collection, annotation, and analysis for multiple participants engaged in planning meetings. In support of the analysis tasks, this thesis presents the multimedia Visualization for Situated Temporal Analysis for Macintosh (MacVisSTA) system. It supports the analysis of multimodal human communication through the use of video, audio, speech transcriptions, and gesture and head orientation data. The system uses a multiple linked representation strategy in which different representations are linked by the current time focus. MacVisSTA supports analysis of the synchronized data at varying timescales for coarse-to-fine observational studies. The hybrid architecture may be extended through plugins. Finally, this effort has resulted in encoding of behavioral and language data, enabling collaborative research and embodying it with the aid of, and interface to, a database management system.

[1]  Mary Ritchie Key,et al.  The Relationship of Verbal and Nonverbal Communication , 1980 .

[2]  Rik Van de Walle,et al.  Is That a Fish in Your Ear? A Universal Metalanguage for Multimedia , 2007, IEEE MultiMedia.

[3]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[4]  A. McCallum,et al.  Practical Markov Logic Containing First-Order Quantifiers with Application to Identity Uncertainty , 2006 .

[5]  Linh Anh Nguyen The Modal Logic Programming System MProlog , 2004, JELIA.

[6]  Edward A. Fox,et al.  Streams, structures, spaces, scenarios, societies (5s): A formal model for digital libraries , 2004, TOIS.

[7]  Francis K. H. Quek,et al.  A Multimedia System for Temporally Situated Perceptual Psycholinguistic Analysis , 2002, Multimedia Tools and Applications.

[8]  David McNeill,et al.  Gesture and language dialectic , 2002 .

[9]  Roberto García,et al.  Semantic Integration and Retrieval of Multimedia Metadata , 2005, SemAnnot@ISWC.

[10]  Jan-Torsten Milde,et al.  Comparison of multimodal annotation tools , 2006 .

[11]  A.S. d'Avila Garcez,et al.  A connectionist inductive learning system for modal logic programming , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[12]  Pavel Curtis,et al.  MUDs grow up: social virtual reality in the real world , 1994, Proceedings of COMPCON '94.

[13]  Penelope M. Sanderson,et al.  Handling complex real-world data with two cognitive engineering tools: COGENT and MacSHAPA , 1994 .

[14]  M S Magnusson,et al.  Discovering hidden time patterns in behavior: T-patterns and their detection , 2000, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[15]  George Kollios,et al.  Complex Spatio-Temporal Pattern Queries , 2005, VLDB.

[16]  Susan Duncan,et al.  Growth points in thinking-for-speaking , 1998 .

[17]  Ramakant Nevatia,et al.  An Ontology for Video Event Representation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[18]  Linh Anh Nguyen Constructing the Least Models for Positive Modal Logic Programs , 2000, Fundam. Informaticae.

[19]  Özgür Ulusoy,et al.  A Rule-Based Approach to Represent Spatio-Temporal Relations in Video Data , 2000, ADVIS.

[20]  D. McNeill Hand and Mind: What Gestures Reveal about Thought , 1992 .

[21]  Francis K. H. Quek,et al.  MacVisSTA: a system for multimodal analysis , 2004, ICMI '04.

[22]  Mary P. Harper,et al.  VACE Multimodal Meeting Corpus , 2005, MLMI.

[23]  Colin Potts,et al.  Design of Everyday Things , 1988 .

[24]  Francis K. H. Quek,et al.  Meeting room configuration and multiple camera calibration in meeting analysis , 2005, ICMI '05.

[25]  Marc Moens,et al.  Algorithms for Analysing the Temporal Structure of Discourse , 1995, EACL.

[26]  Jan-Torsten Milde,et al.  Comparison of multimodal annotation tools: Workshop report , 2006 .

[27]  Mehmet A. Orgun,et al.  Multi-Dimensional Logic Programming: Theoretical Foundations , 1997, Theor. Comput. Sci..

[28]  Stephen T. Kerr,et al.  Knowledge management support for teachers , 2003 .

[29]  Chris North,et al.  Visualization schemas for flexible information visualization , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[30]  A. Strauss,et al.  The discovery of grounded theory: strategies for qualitative research aldine de gruyter , 1968 .

[31]  Andries P. J. van der Walt,et al.  Temporal grammars , 2002 .

[32]  Mark Liberman,et al.  ATLAS: A Flexible and Extensible Architecture for Linguistic Annotation , 2000, LREC.

[33]  Penelope M. Sanderson,et al.  Exploratory sequential data analysis: foundations , 1994 .

[34]  D. McNeill,et al.  How to transcribe the invisible : and what we see , 1995 .