Visual speech influences speech perception immediately but not automatically

Two experiments examined the time course of the use of auditory and visual speech cues to spoken word recognition using an eye-tracking paradigm. Results of the first experiment showed that the use of visual speech cues from lipreading is reduced if concurrently presented pictures require a division of attentional resources. This reduction was evident even when listeners’ eye gaze was on the speaker rather than the (static) pictures. Experiment 2 used a deictic hand gesture to foster attention to the speaker. At the same time, the visual processing load was reduced by keeping the visual display constant over a fixed number of successive trials. Under these conditions, the visual speech cues from lipreading were used. Moreover, the eye-tracking data indicated that visual information was used immediately and even earlier than auditory information. In combination, these data indicate that visual speech cues are not used automatically, but if they are used, they are used immediately.

[1]  Marc Brysbaert,et al.  Subtlex-UK: A New and Improved Word Frequency Database for British English , 2014, Quarterly journal of experimental psychology.

[2]  R. Campbell,et al.  Audiovisual Integration of Speech Falters under High Attention Demands , 2005, Current Biology.

[3]  H. Mitterer,et al.  Letters don’t matter: No effect of orthography on the perception of conversational speech , 2015 .

[4]  Lawrence D Rosenblum,et al.  Speech Perception as a Multimodal Phenomenon , 2008, Current directions in psychological science.

[5]  Neil A. Macmillan,et al.  Detection Theory: A User's Guide , 1991 .

[6]  C. Fowler,et al.  Listening with eye and hand: cross-modal contributions to speech perception. , 1991, Journal of experimental psychology. Human perception and performance.

[7]  Eva Reinisch,et al.  Phonetic category recalibration: What are the categories? , 2014, J. Phonetics.

[8]  G. Altmann Language can mediate eye movement control within 100 milliseconds, regardless of whether there is anything to move the eyes to , 2011, Acta psychologica.

[9]  Ulrich Hans Frauenfelder,et al.  Simulating the time course of spoken word recognition : an analysis of lexical competition in TRACE , 1998 .

[10]  P. Kay,et al.  Universals and cultural variation in turn-taking in conversation , 2009, Proceedings of the National Academy of Sciences.

[11]  S. Soto-Faraco,et al.  Visual information constrains early and late stages of spoken-word recognition in sentence context. , 2013, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[12]  Jean Vroomen,et al.  Visual Anticipatory Information Modulates Multisensory Interactions of Artificial Audiovisual Stimuli , 2010, Journal of Cognitive Neuroscience.

[13]  S. Blumstein,et al.  What you see isn’t always what you get: Auditory word signals trump consciously perceived words in lexical access , 2016, Cognition.

[14]  Sujata Ghosh,et al.  Merging Information , 2009 .

[15]  J. Vroomen,et al.  Phonetic recalibration does not depend on working memory , 2010, Experimental Brain Research.

[16]  M Turennout,et al.  Brain activity during speaking: from syntax to phonology in 40 milliseconds. , 1998, Science.

[17]  Lawrence Brancazio,et al.  Lexical influences in audiovisual speech perception. , 2004, Journal of experimental psychology. Human perception and performance.

[18]  A. Meyer,et al.  Using the visual world paradigm to study language processing: a review and critical evaluation. , 2011, Acta psychologica.

[19]  Sabine Windmann,et al.  Effects of Sentence Context and Expectation on the McGurk Illusion. , 2004 .

[20]  Eva Reinisch,et al.  The uptake of spectral and temporal cues in vowel perception is rapidly influenced by context , 2013, J. Phonetics.

[21]  B. Repp Perception of the [m]-[n] distinction in CV syllables. , 1986, The Journal of the Acoustical Society of America.

[22]  A. Macleod,et al.  Quantifying the contribution of vision to speech perception in noise. , 1987, British journal of audiology.

[23]  Dave Kleinschmidt,et al.  Immediate effects of anticipatory coarticulation in spoken-word recognition. , 2014, Journal of memory and language.

[24]  M. Tanenhaus,et al.  Continuous mapping from sound to meaning in spoken-language comprehension: immediate effects of verb-based thematic constraints. , 2004, Journal of experimental psychology. Learning, memory, and cognition.

[25]  T. Poggio,et al.  Cognitive neuroscience: Neural mechanisms for the recognition of biological movements , 2003, Nature Reviews Neuroscience.

[26]  Richard N Aslin,et al.  Tracking the time course of phonetic cue integration during spoken word recognition , 2008, Psychonomic bulletin & review.

[27]  Jeffrey N. Rouder,et al.  Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[28]  Tobias S. Andersen,et al.  Visual attention modulates audiovisual speech perception , 2004 .

[29]  C. Shanks Simulating the time-course of clinical paralysis , 1988, International journal of clinical monitoring and computing.

[30]  K. Munhall,et al.  Gaze behavior in audiovisual speech perception: The influence of ocular fixations on the McGurk effect , 2003, Perception & Psychophysics.

[31]  Richard D. Morey,et al.  Confidence Intervals from Normalized Data: A correction to Cousineau (2005) , 2008 .

[32]  P. Deltenre,et al.  Mismatch negativity evoked by the McGurk–MacDonald effect: a phonetic representation within short-term memory , 2002, Clinical Neurophysiology.

[33]  Brain activity. , 2014, Nature nanotechnology.

[34]  Paul D. Allopenna,et al.  Tracking the Time Course of Spoken Word Recognition Using Eye Movements: Evidence for Continuous Mapping Models , 1998 .

[35]  H. Mitterer The mental lexicon is fully specified: evidence from eye-tracking. , 2011, Journal of experimental psychology. Human perception and performance.

[36]  P. Kiely,et al.  When /b/ill with /g/ill becomes /d/ill: Evidence for a lexical effect in audiovisual speech perception , 2008 .

[37]  C. Spence,et al.  Crossmodal binding: Evaluating the “unity assumption” using audiovisual speech stimuli , 2007, Perception & psychophysics.

[38]  D. Barr,et al.  Random effects structure for confirmatory hypothesis testing: Keep it maximal. , 2013, Journal of memory and language.

[39]  D Norris,et al.  Merging information in speech recognition: Feedback is never necessary , 2000, Behavioral and Brain Sciences.

[40]  B H Repp,et al.  Perception of the [m]-[n] distinction in VC syllables. , 1988, The Journal of the Acoustical Society of America.

[41]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[42]  H. Mitterer,et al.  No delays in application of perceptual learning in speech recognition: Evidence from eye tracking , 2013 .

[43]  A. Jacobs,et al.  The word frequency effect: a review of recent developments and implications for the choice of frequency estimates in German. , 2011, Experimental psychology.

[44]  Salvador Soto-Faraco,et al.  Assessing the role of attention in the audiovisual integration of speech , 2010, Inf. Fusion.

[45]  Mikko Sams,et al.  McGurk effect in Finnish syllables, isolated words, and words in sentences: Effects of word meaning and sentence context , 1998, Speech Commun..

[46]  D. Massaro Perceiving talking faces: from speech perception to a behavioral principle , 1999 .

[47]  Jean Vroomen,et al.  Neural Correlates of Multisensory Integration of Ecologically Valid Audiovisual Events , 2007, Journal of Cognitive Neuroscience.

[48]  David Poeppel,et al.  Visual speech speeds up the neural processing of auditory speech. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[49]  J. Trouvain,et al.  COMPREHENSION OF ULTRA-FAST SPEECH - BLIND VS. "NORMALLY HEARING" PERSONS , 2007 .

[50]  Marina Schmid,et al.  An Introduction To The Event Related Potential Technique , 2016 .

[51]  Penelope Brown,et al.  Gaze, questioning and culture , 2009 .

[52]  A. H. C. van der Heijden,et al.  Selective Attention in Vision , 1991 .

[53]  Sophie M Wuerger,et al.  The time course of auditory–visual processing of speech and body actions: Evidence for the simultaneous activation of an extended neural network for semantic processing , 2013, Neuropsychologia.

[54]  C. Fowler,et al.  Listening with eye and hand: Cross-modal contributions to speech perception. , 1991 .

[55]  P. Bertelson,et al.  Visual Recalibration of Auditory Speech Identification , 2003, Psychological science.

[56]  D. Steriade Directional asymmetries in place assimilation: a perceptual account , 2001 .

[57]  A. Samuel,et al.  Turning a blind eye to the lexicon: ERPs show no cross-talk between lip-read and lexical context during speech sound processing , 2015 .

[58]  Q. Vuong,et al.  Incidental Processing of Biological Motion , 2004, Current Biology.

[59]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[60]  Roger M. Cooper,et al.  The control of eye fixation by the meaning of spoken language: A new methodology for the real-time investigation of speech perception, memory, and language processing. , 1974 .