Quantitative analysis of multimodal speech data

This study presents techniques for quantitatively analyzing coordination and kinematics in multimodal speech using video, audio and electromagnetic articulography (EMA) data. Multimodal speech research has flourished due to recent improvements in technology, yet gesture detection/annotation strategies vary widely, leading to difficulty in generalizing across studies and in advancing this field of research. We describe how FlowAnalyzer software can be used to extract kinematic signals from basic video recordings; and we apply a technique, derived from speech kinematic research, to detect bodily gestures in these kinematic signals. We investigate whether kinematic characteristics of multimodal speech differ dependent on communicative context, and we find that these contexts can be distinguished quantitatively, suggesting a way to improve and standardize existing gesture identification/annotation strategy. We also discuss a method, Correlation Map Analysis (CMA), for quantifying the relationship between speech and bodily gesture kinematics over time. We describe potential applications of CMA to multimodal speech research, such as describing characteristics of speech-gesture coordination in different communicative contexts. The use of the techniques presented here can improve and advance multimodal speech and gesture research by applying quantitative methods in the detection and description of multimodal speech.

[1]  Brian Butterworth,et al.  Gesture and Silence as Indicators of Planning in Speech , 1978 .

[2]  Jeffery A. Jones,et al.  Visual Prosody and Speech Intelligibility , 2004, Psychological science.

[3]  L Saltzman Elliot,et al.  A Dynamical Approach to Gestural Patterning in Speech Production , 1989 .

[4]  G. Beattie,et al.  Iconic hand gestures and the predictability of words in context in spontaneous speech. , 2000, British journal of psychology.

[5]  A. Kendon Movement coordination in social interaction: some examples described. , 1970, Acta psychologica.

[6]  Dani Byrd,et al.  Evaluation of prosodic juncture strength using functional data analysis , 2013, J. Phonetics.

[7]  D. Loehr Aspects of rhythm in gesture and speech , 2007 .

[8]  D. Ostry,et al.  Characteristics of velocity profiles of speech movements. , 1985, Journal of experimental psychology. Human perception and performance.

[9]  Dani Byrd,et al.  Influences on articulatory timing in consonant sequences , 1996 .

[10]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[11]  Susan Shaiman,et al.  Effects of perturbation and prosody on the coordination of speech and gesture , 2014, Speech Commun..

[12]  Tessa Verhoef,et al.  Measuring conventionalization in the manual modality , 2016 .

[13]  Jacqueline Kory Westlund,et al.  Motion Tracker: Camera-Based Monitoring of Bodily Movements Using Motion Silhouettes , 2015, PloS one.

[14]  Martha W. Alibali,et al.  I see it in my hands’ eye: Representational gestures reflect conceptual demands , 2007 .

[15]  Stefan Kopp,et al.  Gesture and speech in interaction: An overview , 2014, Speech Commun..

[16]  Paul Ekman,et al.  Measuring hand movements , 1979 .

[17]  Jean-Luc Schwartz,et al.  The speech focus position effect on jaw-finger coordination in a pointing task. , 2008, Journal of speech, language, and hearing research : JSLHR.

[18]  Eric Vatikiotis-Bateson,et al.  Articulatory coordination of two vocal tracts , 2014, J. Phonetics.

[19]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[20]  Fred Cummins,et al.  The temporal relation between beat gestures and speech , 2011 .

[21]  R. Krauss,et al.  PSYCHOLOGICAL SCIENCE Research Article GESTURE, SPEECH, AND LEXICAL ACCESS: The Role of Lexical Movements in Speech Production , 2022 .

[22]  Francis K. H. Quek,et al.  Gestural Hand Motion Oscillation and Symmetries for Multimodal Discourse: Detection and Analysis , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[23]  S. Nobe Language and Gesture: Where do most spontaneous representational gestures actually occur with respect to speech? , 2000 .

[24]  Janet Beavin Bavelas,et al.  Gesturing on the telephone: Independent effects of dialogue and visibility. , 2008 .

[25]  Susan Goldin-Meadow,et al.  Gesture for Linguists: A Handy Primer , 2015, Lang. Linguistics Compass.

[26]  D. McNeill Hand and Mind: What Gestures Reveal about Thought , 1992 .

[27]  D J Ostry,et al.  Similarities in the control of the speech articulators and the limbs: kinematics of tongue dorsum movement in speech. , 1983, Journal of experimental psychology. Human perception and performance.

[28]  J. D. Ruiter The production of gesture and speech , 2000 .

[29]  Rick Dale,et al.  Behavior Matching in Multimodal Communication Is Synchronized , 2012, Cogn. Sci..

[30]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[31]  Eric Vatikiotis-Bateson,et al.  Quantifying time-varying coordination of multimodal speech signals using correlation map analysis. , 2012, The Journal of the Acoustical Society of America.

[32]  Alexandra Paxton,et al.  Frame-differencing methods for measuring bodily synchrony in conversation , 2012, Behavior Research Methods.

[33]  Robert Fuhrman Vocal effort and within-speaker coordination in speech production : effects on postural control , 2014 .

[34]  Hani Yehia,et al.  Quantitative association of vocal-tract and facial behavior , 1998, Speech Commun..

[35]  Dani Byrd,et al.  Spatiotemporal coupling between speech and manual motor actions , 2014, J. Phonetics.

[36]  M. Swerts,et al.  The Effects of Visual Beats on Prosodic Prominence: Acoustic Analyses, Auditory Perception and Visual Perception. , 2007 .

[37]  Asli Ozyurek Hearing and seeing meaning in speech and gesture: insights from brain and behaviour , 2014 .

[38]  J. D. Ruiter,et al.  Primary and secondary pragmatic functions of pointing gestures , 2007 .

[39]  Hani Yehia,et al.  Linguistically valid movement behavior measured non-invasively , 2008, AVSP.

[40]  Mark K. Tiede,et al.  A Kinematic Study of Prosodic Structure in Articulatory and Manual Gestures: Results from a Novel Method of Data Collection , 2017, Laboratory phonology.

[41]  D. Bolinger,et al.  语言要略 = Aspects of Language , 1968 .

[42]  Francis K. H. Quek,et al.  Hand Motion Oscillatory Gestures and Multimodal Discourse Analysis , 2006, Int. J. Hum. Comput. Interact..

[43]  M. Pickering,et al.  Why is conversation so easy? , 2004, Trends in Cognitive Sciences.