Automatic audiovisual integration in speech perception

Two experiments aimed to determine whether features of both the visual and acoustical inputs are always merged into the perceived representation of speech and whether this audiovisual integration is based on either cross-modal binding functions or on imitation. In a McGurk paradigm, observers were required to repeat aloud a string of phonemes uttered by an actor (acoustical presentation of phonemic string) whose mouth, in contrast, mimicked pronunciation of a different string (visual presentation). In a control experiment participants read the same printed strings of letters. This condition aimed to analyze the pattern of voice and the lip kinematics controlling for imitation. In the control experiment and in the congruent audiovisual presentation, i.e. when the articulation mouth gestures were congruent with the emission of the string of phones, the voice spectrum and the lip kinematics varied according to the pronounced strings of phonemes. In the McGurk paradigm the participants were unaware of the incongruence between visual and acoustical stimuli. The acoustical analysis of the participants’ spoken responses showed three distinct patterns: the fusion of the two stimuli (the McGurk effect), repetition of the acoustically presented string of phonemes, and, less frequently, of the string of phonemes corresponding to the mouth gestures mimicked by the actor. However, the analysis of the latter two responses showed that the formant 2 of the participants’ voice spectra always differed from the value recorded in the congruent audiovisual presentation. It approached the value of the formant 2 of the string of phonemes presented in the other modality, which was apparently ignored. The lip kinematics of the participants repeating the string of phonemes acoustically presented were influenced by the observation of the lip movements mimicked by the actor, but only when pronouncing a labial consonant. The data are discussed in favor of the hypothesis that features of both the visual and acoustical inputs always contribute to the representation of a string of phonemes and that cross-modal integration occurs by extracting mouth articulation features peculiar for the pronunciation of that string of phonemes.

[1]  R. Campbell,et al.  Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex , 2000, Current Biology.

[2]  J. Mazziotta,et al.  Neural mechanisms of empathy in humans: A relay from neural systems for imitation to limbic areas , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Alan C. Evans,et al.  Lateralization of phonetic and pitch discrimination in speech processing. , 1992, Science.

[4]  G. Rizzolatti,et al.  Action observation activates premotor and parietal areas in a somatotopic manner: an fMRI study , 2001, The European journal of neuroscience.

[5]  Scott T Grafton,et al.  Functional imaging of face and hand imitation: towards a motor theory of empathy , 2004, NeuroImage.

[6]  G. Plant Perceiving Talking Faces: From Speech Perception to a Behavioral Principle , 1999 .

[7]  W. H. Sumby,et al.  Erratum: Visual Contribution to Speech Intelligibility in Noise [J. Acoust. Soc. Am. 26, 212 (1954)] , 1954 .

[8]  Q. Summerfield,et al.  Lipreading and audio-visual speech perception. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[9]  Richard S. J. Frackowiak,et al.  The anatomy of phonological and semantic processing in normal subjects. , 1992, Brain : a journal of neurology.

[10]  M. Arbib,et al.  Language within our grasp , 1998, Trends in Neurosciences.

[11]  Y. Sugita,et al.  Auditory-visual speech perception examined by fMRI and PET , 2003, Neuroscience Research.

[12]  Richard S. J. Frackowiak,et al.  The neural correlates of the verbal component of working memory , 1993, Nature.

[13]  T. Paus,et al.  Modulation of Motor Excitability during Speech Perception: The Role of Broca's Area , 2004, Journal of Cognitive Neuroscience.

[14]  A. Meltzoff Elements of a developmental theory of imitation , 2002 .

[15]  J. Mazziotta,et al.  The essential role of Broca's area in imitation , 2003, The European journal of neuroscience.

[16]  R. Campbell,et al.  Hearing by eye : the psychology of lip-reading , 1988 .

[17]  W. Prinz,et al.  The imitative mind : development, evolution, and brain bases , 2002 .

[18]  Eric Vatikiotis-Bateson,et al.  The moving face during speech communication , 1998 .

[19]  Ken W Grant,et al.  Hearing by Eye II: Advances in the Psychology of Speechreading and Auditory–Visual Speech, edited by Ruth Campbell, Barbara Dodd, and Denis Burnham , 1999, Trends in Cognitive Sciences.

[20]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[21]  E. Bullmore,et al.  Activation of auditory cortex during silent lipreading. , 1997, Science.

[22]  G. Rizzolatti,et al.  Neural Circuits Involved in the Recognition of Actions Performed by Nonconspecifics: An fMRI Study , 2004, Journal of Cognitive Neuroscience.

[23]  L. Canepari Manuale di fonetica , 2005 .

[24]  S. Bookheimer Functional MRI of language: new approaches to understanding the cortical organization of semantic processing. , 2002, Annual review of neuroscience.

[25]  R. Campbell,et al.  Reading Speech from Still and Moving Faces: The Neural Substrates of Visible Speech , 2003, Journal of Cognitive Neuroscience.

[26]  Trevor H. Chen,et al.  Mandarin speech perception by ear and eye follows a universal principle , 2004, Perception & psychophysics.

[27]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[28]  R. Campbell,et al.  Hearing by eye 2 : advances in the psychology of speechreading and auditory-visual speech , 1997 .

[29]  P. McGuire,et al.  Cortical substrates for the perception of face actions: an fMRI study of the specificity of activation for seen speech and for meaningless lower-face acts (gurning). , 2001, Brain research. Cognitive brain research.

[30]  E. Bullmore,et al.  Response amplification in sensory-specific cortices during crossmodal binding. , 1999, Neuroreport.

[31]  J. Mazziotta,et al.  Cortical mechanisms of human imitation. , 1999, Science.

[32]  D. Reisberg,et al.  Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. , 1987 .

[33]  R. C. Oldfield The assessment and analysis of handedness: the Edinburgh inventory. , 1971, Neuropsychologia.

[34]  M. Gentilucci,et al.  Temporal coupling between transport and grasp components during prehension movements: effects of visual perturbation , 1992, Behavioural Brain Research.

[35]  R. E Passingham,et al.  Activations related to “mirror” and “canonical” neurones in the human brain: an fMRI study , 2003, NeuroImage.

[36]  A. Liberman,et al.  The motor theory of speech perception revised , 1985, Cognition.

[37]  Maurizio Gentilucci,et al.  Execution and observation of bringing a fruit to the mouth affect syllable pronunciation , 2004, The European journal of neuroscience.

[38]  Y. Tohkura,et al.  Inter-language differences in the influence of visual cues in speech perception. , 1993 .