Cross-Modal Prediction in Speech Perception

Speech perception often benefits from vision of the speaker's lip movements when they are available. One potential mechanism underlying this reported gain in perception arising from audio-visual integration is on-line prediction. In this study we address whether the preceding speech context in a single modality can improve audiovisual processing and whether this improvement is based on on-line information-transfer across sensory modalities. In the experiments presented here, during each trial, a speech fragment (context) presented in a single sensory modality (voice or lips) was immediately continued by an audiovisual target fragment. Participants made speeded judgments about whether voice and lips were in agreement in the target fragment. The leading single sensory context and the subsequent audiovisual target fragment could be continuous in either one modality only, both (context in one modality continues into both modalities in the target fragment) or neither modalities (i.e., discontinuous). The results showed quicker audiovisual matching responses when context was continuous with the target within either the visual or auditory channel (Experiment 1). Critically, prior visual context also provided an advantage when it was cross-modally continuous (with the auditory channel in the target), but auditory to visual cross-modal continuity resulted in no advantage (Experiment 2). This suggests that visual speech information can provide an on-line benefit for processing the upcoming auditory input through the use of predictive mechanisms. We hypothesize that this benefit is expressed at an early level of speech analysis.

[1]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[2]  Steven L. Small,et al.  Listening to talking faces: motor cortical activation during speech perception , 2005, NeuroImage.

[3]  Jean Vroomen,et al.  Neural Correlates of Multisensory Integration of Ecologically Valid Audiovisual Events , 2007, Journal of Cognitive Neuroscience.

[4]  David Poeppel,et al.  Visual speech speeds up the neural processing of auditory speech. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Kauyumari Sanchez,et al.  Lip-read me now, hear me better later: cross-modal transfer of talker-familiarity effects. , 2007, Psychological science.

[6]  Sidney S. Simon,et al.  Merging of the Senses , 2008, Front. Neurosci..

[7]  P. Keller,et al.  Action Planning in Sequential Skills: Relations to Music Performance , 2008, Quarterly journal of experimental psychology.

[8]  Michael W. Spratling Predictive coding as a model of biased competition in visual attention , 2008, Vision Research.

[9]  C. Spence,et al.  The Handbook of Multisensory Processing , 2004 .

[10]  E. Vatikiotis-Bateson,et al.  `Putting the Face to the Voice' Matching Identity across Modality , 2003, Current Biology.

[11]  J. Enns,et al.  What's next? New evidence for prediction in human vision , 2008, Trends in Cognitive Sciences.

[12]  Ricarda I. Schubotz,et al.  Prediction, Cognition and the Brain , 2009, Front. Hum. Neurosci..

[13]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[14]  Weiant Wathen-Dunn,et al.  Models for the perception of speech and visual form : proceedings of a symposium , 1967 .

[15]  Ronald A. Rensink,et al.  Competition for consciousness among visual events: the psychophysics of reentrant visual processes. , 2000, Journal of experimental psychology. General.

[16]  M. Pickering,et al.  Do people use language production to make predictions during comprehension? , 2007, Trends in Cognitive Sciences.

[17]  Tamar Flash,et al.  Computational approaches to motor control , 2001, Current Opinion in Neurobiology.

[18]  Athena Vouloumanos,et al.  Discriminating languages by speech-reading , 2007, Perception & psychophysics.

[19]  M. Bar Visual objects in context , 2004, Nature Reviews Neuroscience.

[20]  A. Macleod,et al.  Quantifying the contribution of vision to speech perception in noise. , 1987, British journal of audiology.

[21]  Katherine A. DeLong,et al.  Probabilistic word pre-activation during language comprehension inferred from electrical brain activity , 2005, Nature Neuroscience.

[22]  Victor A. F. Lamme The neurophysiology of figure-ground segregation in primary visual cortex , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[23]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[24]  Luc H. Arnal,et al.  Dual Neural Routing of Visual Facilitation in Speech Processing , 2009, The Journal of Neuroscience.

[25]  Q. Summerfield Some preliminaries to a comprehensive account of audio-visual speech perception. , 1987 .

[26]  P. Arnold,et al.  Bisensory augmentation: a speechreading advantage when speech is clearly audible and intact. , 2001, British journal of psychology.

[27]  Colin M. Brown,et al.  Anticipating upcoming words in discourse: evidence from ERPs and reading times. , 2005, Journal of experimental psychology. Learning, memory, and cognition.

[28]  J. Navarra,et al.  Hearing lips in a second language: visual articulatory information enables the perception of second language sounds , 2007, Psychological research.

[29]  J. Bullier,et al.  Reaching beyond the classical receptive field of V1 neurons: horizontal or feedback axons? , 2003, Journal of Physiology-Paris.

[30]  Kenneth I Forster,et al.  DMDX: A Windows display program with millisecond accuracy , 2003, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[31]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[32]  D. Poeppel,et al.  Speech perception at the interface of neurobiology and linguistics , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[33]  C. Summerfield,et al.  Expectation (and attention) in visual cognition , 2009, Trends in Cognitive Sciences.

[34]  John J. Foxe,et al.  Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. , 2006, Cerebral cortex.

[35]  Asif A. Ghazanfar,et al.  The Natural Statistics of Audiovisual Speech , 2009, PLoS Comput. Biol..

[36]  A. Jacobs,et al.  Event-Related Potentials Reveal Rapid Verification of Predicted Visual Input , 2009, PloS one.

[37]  Jeremy I. Skipper,et al.  Seeing Voices : How Cortical Areas Supporting Speech Production Mediate Audiovisual Speech Perception , 2007 .

[38]  Paul Schrater,et al.  Shape perception reduces activity in human primary visual cortex , 2002, Proceedings of the National Academy of Sciences of the United States of America.