The shrink point: audiovisual integration of speech-gesture synchrony

Up to now, the focus in gesture research has long been on the production of speech-accompanying gestures and on how speech-gesture utterances contribute to communication. An issue that has mostly been neglected is in how far listeners even perceive the gesture-part of a multimodal utterance. For instance, there has been a major focus on the lexico-semiotic connection between spontaneously coproduced gestures and speech in gesture research (e.g., de Ruiter, 2007; Kita & Ozyurek, 2003; Krauss, Chen & Gottesman, 2000). Due to the rather precise timing between the prosodic peak in speech with the most prominent stroke of the gesture phrase in production, Schegloff (1984) and Krauss, Morrel-Samuels and Colasante (1991; also Rauscher, Krauss & Chen, 1996), among others, coined the phenomenon of lexical affiliation. By following Krauss et al. (1991), the first empirical study of this dissertation investigates the nature of the semiotic relation between speech and gestures, focusing on its applicability to temporal perception and comprehension. When speech and lip movements diverge too far from the original production synchrony, this can be highly irritating to the viewer, even when audio and video stem from the same original recording (e.g., Vatakis, Navarra, Soto-Faraco & Spence, 2008; Feyereisen, 2007) – there is only a small temporal window of audiovisual integration (AVI) within which viewer-listeners can internally align discrepancies between lip movements and the speech supposedly produced by these (e.g. McGurk & MacDonald, 1976). Several studies in the area of psychophysics (e.g., Nishida, 2006; Fujisaki & Nishida, 2005) found that there is also a time window for the perceptual alignment of nonspeech visual and auditory signals. These and further studies on the AVI of speech-lip asynchronies have inspired research on the perception of speech-gesture utterances. McNeill, Cassell, and McCullough (1994; Cassell, McNeill & McCullough, 1999), for instance, discovered that listeners take up information even from artificially combined speech and gestures. More recent studies researching the AVI of speech and gestures have employed event-related potential (ERP) monitoring as a methodological means to investigate the perception of multimodal utterances (e.g., Gullberg & Holmqvist, 1999; 2006; Ozyurek, Willems, Kita & Hagoort, 2007; Habets, Kita, Shao, Ozyurek & Hagoort, 2011). While the aforementioned studies from the fields of psychophysics and speech-only and speech-gesture research have contributed greatly to theories of how listeners perceive multimodal signals, there has been a lack of explorations of natural data and of dyadic situations. This dissertation investigates the perception of naturally produced speech-gesture utterances by having participants rate the naturalness of synchronous and asynchronous versions of speech-gesture utterances using different qualitative and quantitative methodologies such as an online rating study and a preference task. Drawing, for example, from speech-gesture production models based on Levelt's (1989) model of speech production (e.g., de Ruiter, 1998; 2007; Krauss et al., 2000; Kita & Ozyurek, 2003) and founding on the results and analyses of the studies conducted for this dissertation, I finally propose a model draft of a possible transmission cycle between Growth Point (e.g., McNeill, 1985; 1992) and Shrink Point, the perceptual counterpart to the Growth Point. This model includes the temporal and semantic alignment of speech and different gesture types as well as their audiovisual and conceptual integration during perception. The perceptual studies conducted within the scope of this dissertation have revealed varying temporal ranges in which an asynchrony in speechgesture utterances is integrable by the listener, especially iconic gestures.

[1]  Jan Peter de Ruiter,et al.  The Interplay Between Gesture and Speech in the Production of Referring Expressions: Investigating the Tradeoff Hypothesis , 2012, Top. Cogn. Sci..

[2]  Milton Chen Achieving effective floor control with a low-bandwidth gesture-sensitive videoconferencing system , 2002, MULTIMEDIA '02.

[3]  De Ruiter,et al.  Can gesticulation help aphasic people speak, or rather, communicate? , 2006 .

[4]  B. Butterworth,et al.  Iconic gestures, imagery, and word retrieval in speech , 1997 .

[5]  D. McNeill So you think gestures are nonverbal , 1985 .

[6]  Carolin Kirchhof,et al.  So What's Your Affiliation With Gesture? , 2011 .

[7]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[8]  D. Poeppel,et al.  Temporal window of integration in auditory-visual speech perception , 2007, Neuropsychologia.

[9]  B. Butterworth,et al.  Gesture, speech, and computational stages: a reply to McNeill. , 1989, Psychological review.

[10]  D. McNeill Gesture and Thought , 2005 .

[11]  Beth Levy,et al.  Conceptual Representations in Lan-guage Activity and Gesture , 1980 .

[12]  Paul Watzlawick,et al.  Some Tentative Axioms of Communication , 1967 .

[13]  Willem J. M. Levelt,et al.  Gesture and the communicative intention of the speaker , 2005 .

[14]  W. Levelt,et al.  Speaking: From Intention to Articulation , 1990 .

[15]  Jan Peter De Ruiter,et al.  The synchronization of Gesture and Speech in Dutch and Arrernte (an Australian Aboriginal language): A Cross-cultural comparison , 1998 .

[16]  De Ruiter,et al.  The function of hand gesture in spoken conversation , 2003 .

[17]  J. D. Ruiter The production of gesture and speech , 2000 .

[18]  C. Spence,et al.  Audiovisual temporal adaptation of speech: temporal order versus simultaneity judgments , 2008, Experimental Brain Research.

[19]  Jeffery A. Jones,et al.  Multisensory Integration Sites Identified by Perception of Spatial Wavelet Filtered Visual Speech Gesture Information , 2004, Journal of Cognitive Neuroscience.

[20]  Brian Butterworth,et al.  Gesture and Silence as Indicators of Planning in Speech , 1978 .

[21]  D. Robertson-Ritchie,et al.  The naked ape , 1968 .

[22]  P. Ekman Facial expressions of emotion: an old controversy and new findings. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[23]  P. Ekman Emotions Revealed: Recognizing Faces and Feelings to Improve Communication and Emotional Life , 2003 .

[24]  L. Vygotsky Thinking and Speech , 1987 .

[25]  A. Kendon Gesticulation and Speech: Two Aspects of the Process of Utterance , 1981 .

[26]  C. Creider Hand and Mind: What Gestures Reveal about Thought , 1994 .

[27]  A. Kendon Some Relationships Between Body Motion and Speech , 1972 .

[28]  W. Stokoe,et al.  Sign language structure: an outline of the visual communication systems of the American deaf. 1960. , 1961, Journal of deaf studies and deaf education.

[29]  S. Duncan Gestural imagery and cohesion in normal and impaired discourse , 2008 .

[30]  Pierre Feyereisen,et al.  Manual Activity During Speaking in Aphasic Subjects , 1983 .

[31]  Onno Crasborn,et al.  Enhanced ELAN functionality for sign language corpora , 2008, LREC 2008.

[32]  M. Pickering,et al.  Toward a mechanistic psychology of dialogue , 2004, Behavioral and Brain Sciences.

[33]  Justine Cassell,et al.  Communicative Effects of Speech-Mismatched Gestures , 1994 .

[34]  G. Beattie,et al.  An experimental investigation of the role of iconic gestures in lexical access using the tip-of-the-tongue phenomenon. , 1999, British journal of psychology.

[35]  Patric Bach,et al.  Communicating hands: ERPs elicited by meaningful symbolic hand postures , 2004, Neuroscience Letters.

[36]  S. Goldin-Meadow,et al.  What the teacher's hands tell the student's mind about math. , 1999 .

[37]  Wolfram Ziegler,et al.  The actual and potential use of gestures for communication in aphasia , 2013 .

[38]  A. Einstein Über einen die Erzeugung und Verwandlung des Lichtes betreffenden heuristischen Gesichtspunkt [AdP 17, 132 (1905)] , 2005, Annalen der Physik.

[39]  K. Rudnick The Essential Peirce: Selected Philosophical Writings , 1993 .

[40]  D. Slobin Thinking for Speaking , 1987 .

[41]  Peter Indefrey,et al.  The Spatial and Temporal Signatures of Word Production Components: A Critical Update , 2011, Front. Psychology.

[42]  Kenneth Holmqvist,et al.  Keeping an eye on gestures: Visual perception of gestures in face-to-face communication , 1999 .

[43]  Yihsiu Chen,et al.  Language and Gesture: Lexical gestures and lexical access: a process model , 2000 .

[44]  Shin'ya Nishida Interactions and Integrations of Multiple Sensory Channels in Human Brain , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[45]  Sotaro Kita,et al.  On-line Integration of Semantic Information from Speech and Gesture: Insights from Event-related Brain Potentials , 2007, Journal of Cognitive Neuroscience.

[46]  Ellen Fricke,et al.  Grammatik multimodal : wie Wörter und Gesten zusammenwirken , 2012 .

[47]  David McNeill,et al.  Why We Gesture: The Surprising Role of Hand Movements in Communication , 2015 .

[48]  Hans-Peter Seidel,et al.  Annotated New Text Engine Animation Animation Lexicon Animation Gesture Profiles MR : . . . JL : . . . Gesture Generation Video Annotated Gesture Script , 2007 .

[49]  D. McNeill,et al.  A straight path-to where? Reply to Butterworth and Hadar. , 1989, Psychological review.

[50]  Stefan Kopp,et al.  How is information distributed across speech and gesture? A cognitive modeling approach , 2014 .

[51]  L. Marstaller,et al.  The multisensory perception of co-speech gestures – A review and meta-analysis of neuroimaging studies , 2014, Journal of Neurolinguistics.

[52]  R. Krauss,et al.  Do conversational hand gestures communicate? , 1991, Journal of personality and social psychology.

[53]  B. Butterworth,et al.  Speech and Interaction in Sound-only Communication Channels , 1977 .

[54]  Robin R. Murphy,et al.  Evaluation of Head Gaze Loosely Synchronized With Real-Time Synthetic Speech for Social Robots , 2014, IEEE Transactions on Human-Machine Systems.

[55]  Zeshu Shao,et al.  The Role of Synchrony and Ambiguity in Speech–Gesture Integration during Comprehension , 2011, Journal of Cognitive Neuroscience.

[56]  Marshall R. Mayberry,et al.  Learning to Attend: A Connectionist Model of Situated Language Comprehension , 2009, Cogn. Sci..

[57]  Shin’ya Nishida,et al.  Feature-based processing of audio-visual synchrony perception revealed by random pulse trains , 2007, Vision Research.

[58]  Waka Fujisaki,et al.  Top-down feature-based selection of matching features for audio-visual synchrony discrimination , 2008, Neuroscience Letters.

[59]  R. Campbell,et al.  Hearing by Eye , 1980, The Quarterly journal of experimental psychology.

[60]  Karl Bühler,et al.  Theory of Language: The Representational Function of Language , 2011 .

[61]  Michael J. Spivey,et al.  Syntactic ambiguity resolution in discourse: modeling the effects of referential context and lexical frequency. , 1998, Journal of experimental psychology. Learning, memory, and cognition.

[62]  Susan Goldin Hearing gesture : how our hands help us think , 2003 .

[63]  Frieda Goldman Eisler Psycholinguistics : experiments in spontaneous speech , 1968 .

[64]  Dominic W. Massaro,et al.  Perceiving asynchronous bimodal speech in consonant-vowel and vowel syllables , 1993, Speech Commun..

[65]  A. Kendon Gesture: Visible Action as Utterance , 2004 .

[66]  Stefan Kopp,et al.  Implementing a non-modular theory of language production in an embodied , 2008 .

[67]  M. Argyle Bodily communication, 2nd ed. , 1988 .

[68]  F. Pollick,et al.  Expertise with multisensory events eliminates the effect of biological motion rotation on audiovisual synchrony perception. , 2010, Journal of vision.

[69]  Marvin Karlins,et al.  What Every BODY is Saying: An Ex-FBI Agent's Guide to Speed-Reading People , 2008 .

[70]  D. Massaro Speech Perception By Ear and Eye: A Paradigm for Psychological Inquiry , 1989 .

[71]  Przemyslaw Lenkiewicz,et al.  Towards Automatic Gesture Stroke Detection , 2012, LREC.

[72]  M. Alibali,et al.  Transitions in concept acquisition: using the hand to read the mind. , 1993, Psychological review.

[73]  R. Krauss,et al.  Word Familiarity Predicts Temporal Asynchrony of Hand Gestures and Speech , 2010 .

[74]  Satoru Hayamizu,et al.  Hand Gestures of an Anthropomorphic Agent: Listeners' Eye Fixation and Comprehension , 2000 .

[75]  Siobhan Chapman Logic and Conversation , 2005 .

[76]  D. Morris Manwatching : a field guide to human behaviour , 1977 .

[77]  Sotaro Kita,et al.  Attention to Speech-Accompanying Gestures: Eye Movements and Information Uptake , 2009, Journal of nonverbal behavior.

[78]  Stefan Kopp,et al.  Alignment in communication : towards a new theory of communication , 2013 .

[79]  D. McNeill Language and Gesture: Gesture in action , 2000 .

[80]  G. Beattie,et al.  Gestures, pauses and speech: An experimental investigation of the effects of changing social context on their precise temporal relationships , 2009 .

[81]  W. Chafe Discourse, Consciousness, and Time: The Flow and Displacement of Conscious Experience in Speaking and Writing , 1996 .

[82]  Gesture categorisation and understanding speaker attention to gesture , 2010 .

[83]  Gillian Hovell,et al.  Body language , 1997, Nature.

[84]  Janet Beavin Bavelas,et al.  Gesturing on the telephone: Independent effects of dialogue and visibility. , 2008 .

[85]  Kenneth Holmqvist,et al.  What speakers do and what addressees look at: visual attention to gestures in human interaction live and on video , 2006 .

[86]  Stefan Kopp,et al.  Mapping out the multifunctionality of speakers’ gestures , 2016 .

[87]  S. Nobe,et al.  Representational gestures, cognitive rhythms, and acoustic aspects of speech: A network threshold model of gesture production , 1996 .

[88]  R. Krauss,et al.  PSYCHOLOGICAL SCIENCE Research Article GESTURE, SPEECH, AND LEXICAL ACCESS: The Role of Lexical Movements in Speech Production , 2022 .

[89]  Xavier Seron,et al.  Nonverbal communication and aphasia: a review II. Expression , 1982, Brain and Language.

[90]  G. Beattie,et al.  Do Iconic Hand Gestures Really Contribute to the Communication of Semantic Information in a Face-to-Face Context? , 2009 .

[91]  D W Massaro,et al.  Perception of asynchronous and conflicting visual and auditory speech. , 1996, The Journal of the Acoustical Society of America.

[92]  D. Slobin,et al.  From Gestures to Signs in the Acquisition of Sign Language 1 , 2007 .

[93]  De Ruiter,et al.  Postcards from the mind: The relationship between speech, imagistic gesture and thought , 2007 .

[94]  M. Alibali,et al.  Gesture-Speech Mismatch and Mechanisms of Learning: What the Hands Reveal about a Child′s State of Mind , 1993, Cognitive Psychology.

[95]  Sotaro Kita,et al.  What does cross-linguistic variation in semantic coordination of speech and gesture reveal? Evidence for an interface representation of spatial thinking and speaking , 2003 .

[96]  Susan Duncan,et al.  Growth points in thinking-for-speaking , 1998 .

[97]  E. Schegloff Structures of Social Action: On some gestures' relation to talk , 1985 .

[98]  Sotaro Kita,et al.  Gestures and speech disfluencies , 2003 .

[99]  Waka Fujisaki,et al.  Temporal frequency characteristics of synchrony–asynchrony discrimination of audio-visual signals , 2005, Experimental Brain Research.

[100]  D. McNeill,et al.  Speech-gesture mismatches: Evidence for one underlying representation of linguistic and nonlinguistic information , 1998 .

[101]  D. McNeill,et al.  IW - “The Man Who Lost His Body” , 2010 .

[102]  Frieda Goldman-Eisler The Predictability of Words in Context and the Length of Pauses in Speech , 1961 .

[103]  Michael Neff,et al.  State of the Art in Hand and Finger Modeling and Animation , 2015, Comput. Graph. Forum.

[104]  Susan Goldin-Meadow,et al.  Gesturing has a larger impact on problem-solving than action, even when action is accompanied by words , 2015, Language, cognition and neuroscience.

[105]  Stefan Kopp,et al.  Gesture and speech in interaction: An overview , 2014, Speech Commun..

[106]  M. Alibali,et al.  Gesture's role in speaking, learning, and creating language. , 2013, Annual review of psychology.

[107]  D. McNeill How Language Began: Gesture and Speech in Human Evolution , 2012 .

[108]  C. Kirchhof The Truth about Mid-Life Singles in the USA: A Corpus-Based Analysis of Printed Personal Advertisements , 2010 .

[109]  Dafydd Gibbon,et al.  Gesture Theory is Linguistics: On Modelling Multimodality as Prosody , 2009, PACLIC.

[110]  Carola de Beer,et al.  A critical evaluation of models of gesture and speech production for understanding gesture in aphasia , 2013 .

[111]  Stefan Kopp,et al.  The Relation of Speech and Gestures: Temporal Synchrony Follows Semantic Synchrony , 2011 .

[112]  Adam Kendon,et al.  How gestures can become like words , 1988 .

[113]  David Escudero Mancebo,et al.  DISENTANGLING AND CONNECTING DIFFERENT PERSPECTIVES ON PROSODIC PROMINENCE , 2015 .

[114]  P. Ekman,et al.  The Repertoire of Nonverbal Behavior: Categories, Origins, Usage, and Coding , 1969 .

[115]  M. Alibali,et al.  Effects of Visibility between Speaker and Listener on Gesture Production: Some Gestures Are Meant to Be Seen , 2001 .

[116]  O. Capirci,et al.  Cross-linguistic Views of Gesture Usage , 2015 .

[117]  Evelyn McClave,et al.  Gestural beats: The rhythm hypothesis , 1994 .

[118]  De Ruiter,et al.  Gesture and speech production , 1998 .

[119]  Jean-Claude Martin,et al.  Gesture and emotion: Can basic gestural form features discriminate emotions? , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[120]  Susan Duncan,et al.  Co-expressivity of Speech and Gesture: Manner of Motion in Spanish, English, and Chinese , 2001 .

[121]  P. Feyereisen How do gesture and speech production synchronise , 2007 .

[122]  A. Kendon Some functions of gaze-direction in social interaction. , 1967, Acta psychologica.

[123]  A. Kendon Gestures as illocutionary and discourse structure markers in Southern Italian conversation , 1995 .