Mouth and Voice: A Relationship between Visual and Auditory Preference in the Human Superior Temporal Sulcus

Cortex in and around the human posterior superior temporal sulcus (pSTS) is known to be critical for speech perception. The pSTS responds to both the visual modality (especially biological motion) and the auditory modality (especially human voices). Using fMRI in single subjects with no spatial smoothing, we show that visual and auditory selectivity are linked. Regions of the pSTS were identified that preferred visually presented moving mouths (presented in isolation or as part of a whole face) or moving eyes. Mouth-preferring regions responded strongly to voices and showed a significant preference for vocal compared with nonvocal sounds. In contrast, eye-preferring regions did not respond to either vocal or nonvocal sounds. The converse was also true: regions of the pSTS that showed a significant response to speech or preferred vocal to nonvocal sounds responded more strongly to visually presented mouths than eyes. These findings can be explained by environmental statistics. In natural environments, humans see visual mouth movements at the same time as they hear voices, while there is no auditory accompaniment to visual eye movements. The strength of a voxel's preference for visual mouth movements was strongly correlated with the magnitude of its auditory speech response and its preference for vocal sounds, suggesting that visual and auditory speech features are coded together in small populations of neurons within the pSTS. SIGNIFICANCE STATEMENT Humans interacting face to face make use of auditory cues from the talker's voice and visual cues from the talker's mouth to understand speech. The human posterior superior temporal sulcus (pSTS), a brain region known to be important for speech perception, is complex, with some regions responding to specific visual stimuli and others to specific auditory stimuli. Using BOLD fMRI, we show that the natural statistics of human speech, in which voices co-occur with mouth movements, are reflected in the neural architecture of the pSTS. Different pSTS regions prefer visually presented faces containing either a moving mouth or moving eyes, but only mouth-preferring regions respond strongly to voices.

[1]  Thomas E. Nichols,et al.  Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate , 2002, NeuroImage.

[2]  Joost X. Maier,et al.  Multisensory Integration of Dynamic Faces and Voices in Rhesus Monkey Auditory Cortex , 2005 .

[3]  Stefan J. Kiebel,et al.  Visual face-movement sensitive cortex is relevant for auditory-only speech recognition , 2015, Cortex.

[4]  K. Paller,et al.  Brain networks for analyzing eye gaze. , 2003, Brain research. Cognitive brain research.

[5]  Cyriel M. A. Pennartz,et al.  Modality-specific and modality-independent components of the human imagery system , 2010, NeuroImage.

[6]  N. Kanwisher,et al.  Does the fusiform face area contain subregions highly selective for nonfaces? , 2007, Nature Neuroscience.

[7]  A. Dale,et al.  Cortical Surface-Based Analysis II: Inflation, Flattening, and a Surface-Based Coordinate System , 1999, NeuroImage.

[8]  J. Haxby,et al.  fMRI Responses to Video and Point-Light Displays of Moving Humans and Manipulable Objects , 2003, Journal of Cognitive Neuroscience.

[9]  Milene Bonte,et al.  Decoding Articulatory Features from fMRI Responses in Dorsal Speech Regions , 2015, The Journal of Neuroscience.

[10]  B. Argall,et al.  Unraveling multisensory integration: patchy organization within human STS multisensory cortex , 2004, Nature Neuroscience.

[11]  A. Ishai,et al.  Distributed neural systems for the generation of visual images , 2000, NeuroImage.

[12]  Á. Pascual-Leone,et al.  Repetitive TMS over posterior STS disrupts perception of biological motion , 2005, Vision Research.

[13]  B. Argall,et al.  Integration of Auditory and Visual Information about Objects in Superior Temporal Sulcus , 2004, Neuron.

[14]  Michael Erb,et al.  Audiovisual integration of emotional signals in voice and face: An event-related fMRI study , 2007, NeuroImage.

[15]  N. Kanwisher,et al.  Spatial pattern of BOLD fMRI activation reveals cross-modal information in auditory cortex. , 2012, Journal of neurophysiology.

[16]  Luca Passamonti,et al.  Connectivity Analysis Reveals a Cortical Network for Eye Gaze Perception , 2009, Cerebral cortex.

[17]  K. Kiehl,et al.  Detection of Sounds in the Auditory Stream: Event-Related fMRI Evidence for Differential Activation to Speech and Nonspeech , 2001, Journal of Cognitive Neuroscience.

[18]  Athena Vouloumanos,et al.  The Superior Temporal Sulcus Differentiates Communicative and Noncommunicative Auditory Signals , 2012, Journal of Cognitive Neuroscience.

[19]  R. Blake,et al.  Brain Areas Involved in Perception of Biological Motion , 2000, Journal of Cognitive Neuroscience.

[20]  K. von Kriegstein,et al.  Functional Connectivity between Face-Movement and Speech-Intelligibility Areas during Auditory-Only Speech Perception , 2014, PloS one.

[21]  Christoph Kayser,et al.  Spatial Organization of Multisensory Responses in Temporal Association Cortex , 2009, The Journal of Neuroscience.

[22]  J. Haxby,et al.  Parallel Visual Motion Processing Streams for Manipulable Objects and Human Movements , 2002, Neuron.

[23]  R. Goebel,et al.  Integration of Letters and Speech Sounds in the Human Brain , 2004, Neuron.

[24]  A. Brewer,et al.  Maps of the Auditory Cortex. , 2016, Annual review of neuroscience.

[25]  D. Pandya,et al.  Parietal, temporal, and occipita projections to cortex of the superior temporal sulcus in the rhesus monkey: A retrograde tracer study , 1994, The Journal of comparative neurology.

[26]  Hans Knutsson,et al.  Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates , 2016, Proceedings of the National Academy of Sciences.

[27]  R. Desimone,et al.  Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. , 1981, Journal of neurophysiology.

[28]  Ed Vul,et al.  Voodoo and circularity errors , 2012, NeuroImage.

[29]  Anders M. Dale,et al.  Cortical Surface-Based Analysis I. Segmentation and Surface Reconstruction , 1999, NeuroImage.

[30]  E. Zohary,et al.  Topographic Representation of the Human Body in the Occipitotemporal Cortex , 2010, Neuron.

[31]  Audrey R. Nath,et al.  fMRI-Guided Transcranial Magnetic Stimulation Reveals That the Superior Temporal Sulcus Is a Cortical Locus of the McGurk Effect , 2010, The Journal of Neuroscience.

[32]  Joachim Gross,et al.  The early spatio-temporal correlates and task independence of cerebral voice processing studied with MEG. , 2013, Cerebral cortex.

[33]  Nancy Kanwisher,et al.  Functional Organization of Social Perception and Cognition in the Superior Temporal Sulcus , 2015, Cerebral cortex.

[34]  E. Vatikiotis-Bateson,et al.  Perceiving Biological Motion: Dissociating Visible Speech from Walking , 2003, Journal of Cognitive Neuroscience.

[35]  D G Pelli,et al.  The VideoToolbox software for visual psychophysics: transforming numbers into movies. , 1997, Spatial vision.

[36]  K. Scherer,et al.  The voices of wrath: brain responses to angry prosody in meaningless speech , 2005, Nature Neuroscience.

[37]  Leslie G. Ungerleider,et al.  Distributed Neural Systems for the Generation of Visual Images , 2000, Neuron.

[38]  Ryan A. Stevenson,et al.  Superadditive BOLD activation in superior temporal sulcus with threshold non-speech objects , 2007, Experimental Brain Research.

[39]  R W Cox,et al.  AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. , 1996, Computers and biomedical research, an international journal.

[40]  Anders M. Dale,et al.  Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature , 2010, NeuroImage.

[41]  H. Pashler,et al.  Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition 1 , 2009, Perspectives on psychological science : a journal of the Association for Psychological Science.

[42]  Doris Y. Tsao,et al.  A face feature space in the macaque temporal lobe , 2009, Nature Neuroscience.

[43]  Leslie G. Ungerleider,et al.  Distributed representation of objects in the human ventral visual pathway. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[44]  R. Zatorre,et al.  Voice-selective areas in human auditory cortex , 2000, Nature.

[45]  Stefan J Kiebel,et al.  How the Human Brain Recognizes Speech in the Context of Changing Speakers , 2010, The Journal of Neuroscience.

[46]  Keith Johnson,et al.  Phonetic Feature Encoding in Human Superior Temporal Gyrus , 2014, Science.

[47]  R. Campbell,et al.  Reading Speech from Still and Moving Faces: The Neural Substrates of Visible Speech , 2003, Journal of Cognitive Neuroscience.

[48]  Leslie G. Ungerleider,et al.  The Effect of Face Inversion on Activity in Human Neural Systems for Face and Object Perception , 1999, Neuron.

[49]  Stefan J. Kiebel,et al.  Simulation of talking faces in the human brain improves auditory speech recognition , 2008, Proceedings of the National Academy of Sciences.

[50]  E. Bullmore,et al.  Activation of auditory cortex during silent lipreading. , 1997, Science.

[51]  W. K. Simmons,et al.  Measuring selectivity in fMRI data , 2007, Nature Neuroscience.

[52]  Pascal Belin,et al.  Is voice processing species-specific in human auditory cortex? An fMRI study , 2004, NeuroImage.

[53]  J. Haxby,et al.  Distinct representations of eye gaze and identity in the distributed human neural system for face perception , 2000, Nature Neuroscience.

[54]  Wolfgang Grodd,et al.  Cerebral representation of non-verbal emotional perception: fMRI reveals audiovisual integration area between voice- and face-sensitive regions in the superior temporal sulcus , 2009, Neuropsychologia.

[55]  L. Bernstein,et al.  Visual speech perception without primary auditory cortex activation , 2002, Neuroreport.

[56]  M. Lassonde,et al.  Multilevel alterations in the processing of audio–visual emotion expressions in autism spectrum disorders , 2013, Neuropsychologia.

[57]  N. Logothetis,et al.  Auditory and Visual Modulation of Temporal Lobe Neurons in Voice-Sensitive and Association Cortices , 2014, The Journal of Neuroscience.

[58]  K. Grill-Spector,et al.  The human visual cortex. , 2004, Annual review of neuroscience.

[59]  T. Allison,et al.  Temporal Cortex Activation in Humans Viewing Eye and Mouth Movements , 1998, The Journal of Neuroscience.

[60]  E. Liebenthal,et al.  Neural pathways for visual speech perception , 2014, Front. Neurosci..

[61]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[62]  Pascal Belin,et al.  People-selectivity, audiovisual integration and heteromodality in the superior temporal sulcus , 2014, Cortex.

[63]  Daniel D. Dilks,et al.  Differential selectivity for dynamic versus static information in face-selective cortical regions , 2011, NeuroImage.

[64]  G. Glover Deconvolution of Impulse Response in Event-Related BOLD fMRI1 , 1999, NeuroImage.

[65]  R J Wise,et al.  Separate neural subsystems within 'Wernicke's area'. , 2001, Brain : a journal of neurology.

[66]  F. Gosselin,et al.  Audio-visual integration of emotion expression , 2008, Brain Research.

[67]  Lawrence Brancazio,et al.  Development of an audiovisual speech perception app for children with autism spectrum disorders , 2015, Clinical linguistics & phonetics.

[68]  Chris I. Baker,et al.  Integration of Visual and Auditory Information by Superior Temporal Sulcus Neurons Responsive to the Sight of Actions , 2005, Journal of Cognitive Neuroscience.

[69]  Jessica S. Arsenault,et al.  Distributed Neural Representations of Phonological Features during Speech Perception , 2015, The Journal of Neuroscience.

[70]  Elias B. Issa,et al.  Precedence of the Eye Region in Neural Processing of Faces , 2012, The Journal of Neuroscience.

[71]  E. T. Possing,et al.  Human temporal lobe activation by speech and nonspeech sounds. , 2000, Cerebral cortex.

[72]  Pascal Belin,et al.  Crossmodal Adaptation in Right Posterior Superior Temporal Sulcus during Face–Voice Emotional Integration , 2014, The Journal of Neuroscience.

[73]  T. Allison,et al.  Functional anatomy of biological motion perception in posterior temporal cortex: an FMRI study of eye, mouth and hand movements. , 2005, Cerebral cortex.

[74]  Josh H. McDermott,et al.  Distinct Cortical Pathways for Music and Speech Revealed by Hypothesis-Free Voxel Decomposition , 2015, Neuron.

[75]  Alan C. Evans,et al.  Enhancement of MR Images Using Registration for Signal Averaging , 1998, Journal of Computer Assisted Tomography.

[76]  Gregory McCarthy,et al.  Polysensory interactions along lateral temporal regions evoked by audiovisual speech. , 2003, Cerebral cortex.