Multimodal Content Representation for Speech and Gesture Production

This paper presents a computational perspective on the joint production process for speech and gesture. Based on empirical evidence indicating a mutual influence of speech and gesture in utterance production, we propose an interface between imagistic and propositional knowledge at the level of content representation. This is integrated into a generation architecture in which the planning of content and the planning of form across both modalities proceeds in an interactive manner.

[1]  D. Kieras Beyond pictures and words: Alternative information-processing models for imagery effect in verbal memory , 1978 .

[2]  Hao Yan,et al.  Coordination and context-dependence in the generation of embodied conversation , 2000, INLG.

[3]  Stefan Kopp,et al.  Synthesizing multimodal utterances for conversational agents , 2004, Comput. Animat. Virtual Worlds.

[4]  De Ruiter,et al.  Postcards from the mind: The relationship between speech, imagistic gesture and thought , 2007 .

[5]  Timo Sowa Understanding coverbal iconic gestures in shape descriptions , 2006 .

[6]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[7]  Michael Neff,et al.  Towards Natural Gesture Synthesis: Evaluating Gesture Units in a Data-Driven Approach to Gesture Synthesis , 2007, IVA.

[8]  S. Levinson Frames of reference and Molyneux's question: Cross-linguistic evidence , 1996 .

[9]  Matthew Stone,et al.  Microplanning with Communicative Intentions: The SPUD System , 2001, Comput. Intell..

[10]  Sotaro Kita,et al.  Relations between syntactic encoding and co-speech gestures: Implications for a model of speech and gesture production , 2007 .

[11]  Marianne Gullberg,et al.  Language-specific encoding of placement events in gestures , 2010 .

[12]  Janet Beavin Bavelas,et al.  An experimental study of when and how speakers use gestures to communicate , 2002 .

[13]  Willem J. M. Levelt,et al.  Gesture and the communicative intention of the speaker , 2005 .

[14]  L. Vygotsky,et al.  Thought and Language , 1963 .

[15]  R. Krauss,et al.  PSYCHOLOGICAL SCIENCE Research Article GESTURE, SPEECH, AND LEXICAL ACCESS: The Role of Lexical Movements in Speech Production , 2022 .

[16]  D. Slobin Thinking for Speaking , 1987 .

[17]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[18]  D. Slobin From “thought and language” to “thinking for speaking” , 1996 .

[19]  A. Kendon Gesture: Visible Action as Utterance , 2004 .

[20]  Martha W. Alibali,et al.  Does Sitting on Your Hands Make You Bite Your Tongue? The Effects of Gesture Prohibition on Speech During Motor Descriptions , 2007 .

[21]  Stefan Kopp,et al.  Towards integrated microplanning of language and iconic gesture for multimodal output , 2004, ICMI '04.

[22]  M. Studdert-Kennedy Hand and Mind: What Gestures Reveal About Thought. , 1994 .

[23]  Martha W. Alibali,et al.  I see it in my hands’ eye: Representational gestures reflect conceptual demands , 2007 .

[24]  Stefan Kopp,et al.  Verbal or Visual? How Information is Distributed across Speech and Gesture in Spatial Dialog , 2006 .

[25]  Annette Herskovits Language and Spatial Cognition: An Interdisciplinary Study of the Prepositions in English , 2009 .

[26]  Ipke Wachsmuth,et al.  A model for the representation and processing of shape in coverbal iconic gestures , 2005 .

[27]  Cornelia Müller,et al.  Redebegleitende Gesten : Kulturgeschichte, Theorie, Sprachvergleich , 1998 .

[28]  Susan Duncan,et al.  Growth points in thinking-for-speaking , 1998 .

[29]  G. Beattie,et al.  An experimental investigation of the role of iconic gestures in lexical access using the tip-of-the-tongue phenomenon. , 1999, British journal of psychology.

[30]  M. Alibali,et al.  Gesture and the process of speech production: We think, therefore we gesture , 2000 .

[31]  J. D. Ruiter The production of gesture and speech , 2000 .

[32]  Stefan Kopp,et al.  Towards an Architecture for Aligned Speech and Gesture Production , 2007, IVA.

[33]  Martha W. Alibali,et al.  Gesture in Spatial Cognition: Expressing, Communicating, and Thinking About Spatial Information , 2005, Spatial Cogn. Comput..

[34]  Paul U. Lee,et al.  How Space Structures Language , 1998, Spatial Cognition.

[35]  De Ruiter,et al.  Some multimodal signals in humans , 2007 .

[36]  Yihsiu Chen,et al.  Language and Gesture: Lexical gestures and lexical access: a process model , 2000 .

[37]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[38]  Gary L. Allen,et al.  Gestures Accompanying Verbal Route Directions: Do They Point to a New Avenue for Examining Spatial Representations? , 2003, Spatial Cogn. Comput..

[39]  Sotaro Kita,et al.  What does cross-linguistic variation in semantic coordination of speech and gesture reveal? Evidence for an interface representation of spatial thinking and speaking , 2003 .

[40]  Maurizio Mancini,et al.  Implementing Expressive Gesture Synthesis for Embodied Conversational Agents , 2005, Gesture Workshop.

[41]  Stefan Kopp,et al.  Trading Spaces: How Humans and Humanoids Use Speech and Gesture to Give Directions , 2007 .

[42]  Ezequiel Morsella,et al.  The role of gestures in spatial working memory and speech. , 2004, The American journal of psychology.

[43]  R. Krauss,et al.  Word Familiarity Predicts Temporal Asynchrony of Hand Gestures and Speech , 2010 .

[44]  Stefan Kopp,et al.  MURML: A Multimodal Utterance Representation Markup Language for Conversational Agents , 2002 .

[45]  Stefan Kopp,et al.  A Cognitive Model for the Representation and Processing of Shape-Related Gestures , 2003 .

[46]  B. Landau,et al.  “What” and “where” in spatial language and spatial cognition , 1993 .