Dynamic, rhythmic facial expressions and the superior temporal sulcus of macaque monkeys: implications for the evolution of audiovisual speech

Audiovisual speech has a stereotypical rhythm that is between 2 and 7 Hz, and deviations from this frequency range in either modality reduce intelligibility. Understanding how audiovisual speech evolved requires investigating the origins of this rhythmic structure. One hypothesis is that the rhythm of speech evolved through the modification of some pre‐existing cyclical jaw movements in a primate ancestor. We tested this hypothesis by investigating the temporal structure of lipsmacks and teeth‐grinds of macaque monkeys and the neural responses to these facial gestures in the superior temporal sulcus (STS), a region implicated in the processing of audiovisual communication signals in both humans and monkeys. We found that both lipsmacks and teeth‐grinds have consistent but distinct peak frequencies and that both fall well within the 2–7 Hz range of mouth movements associated with audiovisual speech. Single neurons and local field potentials of the STS of monkeys readily responded to such facial rhythms, but also responded just as robustly to yawns, a nonrhythmic but dynamic facial expression. All expressions elicited enhanced power in the delta (0–3Hz), theta (3–8Hz), alpha (8–14Hz) and gamma (> 60 Hz) frequency ranges, and suppressed power in the beta (20–40Hz) range. Thus, STS is sensitive to, but not selective for, rhythmic facial gestures. Taken together, these data provide support for the idea that that audiovisual speech evolved (at least in part) from the rhythmic facial gestures of an ancestral primate and that the STS was sensitive to and thus ‘prepared’ for the advent of rhythmic audiovisual communication.

[1]  M. Harries,et al.  Visual Processing of Faces in Temporal Cortex: Physiological Evidence for a Modular Organization and Possible Anatomical Correlates , 1991, Journal of Cognitive Neuroscience.

[2]  P. MacNeilage,et al.  The frame/content theory of evolution of speech production , 1998, Behavioral and Brain Sciences.

[3]  T. Allison,et al.  Temporal Cortex Activation in Humans Viewing Eye and Mouth Movements , 1998, The Journal of Neuroscience.

[4]  Chris I. Baker,et al.  Integration of Visual and Auditory Information by Superior Temporal Sulcus Neurons Responsive to the Sight of Actions , 2005, Journal of Cognitive Neuroscience.

[5]  A. Puce,et al.  Neuronal oscillations and visual amplification of speech , 2008, Trends in Cognitive Sciences.

[6]  N. Logothetis,et al.  Frontiers in Integrative Neuroscience Integrative Neuroscience Directed Interactions between Auditory and Superior Temporal Cortices and Their Role in Sensory Integration , 2022 .

[7]  Asif A Ghazanfar,et al.  Monkey visual behavior falls into the uncanny valley , 2009, Proceedings of the National Academy of Sciences.

[8]  Zachary M. Smith,et al.  Chimaeric sounds reveal dichotomies in auditory perception , 2002, Nature.

[9]  D. Poeppel,et al.  Phase Patterns of Neuronal Responses Reliably Discriminate Speech in Human Auditory Cortex , 2007, Neuron.

[10]  David Poeppel,et al.  The analysis of speech in different temporal integration windows: cerebral lateralization as 'asymmetric sampling in time' , 2003, Speech Commun..

[11]  C. Sherwood Comparative anatomy of the facial motor nucleus in mammals, with an analysis of neuron numbers in primates. , 2005, The anatomical record. Part A, Discoveries in molecular, cellular, and evolutionary biology.

[12]  William K. Redican,et al.  Facial Expressions in Nonhuman Primates , 1975 .

[13]  T Allison,et al.  ERPS EVOKED BY VIEWING FACIAL MOVEMENTS , 2000, Cognitive neuropsychology.

[14]  G. Fant,et al.  Auditory analysis and perception of speech , 1975 .

[15]  Melanie Vitkovitch,et al.  Visible Speech as a Function of Image Quality: Effects of Display Parameters on Lipreading Ability , 1996 .

[16]  R. A. Hinde,et al.  COMMUNICATION BY POSTURES AND FACIAL EXPRESSIONS IN THE RHESUS MONKEY (MACACA MULATTA) , 2009 .

[17]  C. F. Hockett The origin of speech. , 1960, Scientific American.

[18]  R. Campbell,et al.  Reading Speech from Still and Moving Faces: The Neural Substrates of Visible Speech , 2003, Journal of Cognitive Neuroscience.

[19]  Doris Y. Tsao,et al.  Faces and objects in macaque cerebral cortex , 2003, Nature Neuroscience.

[20]  A. J. Mistlin,et al.  Visual cells in the temporal cortex sensitive to face view and gaze direction , 1985, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[21]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[22]  G. Calvert Crossmodal processing in the human brain: insights from functional neuroimaging studies. , 2001, Cerebral cortex.

[23]  N. Logothetis,et al.  Ultra High-Resolution fMRI in Monkeys with Implanted RF Coils , 2002, Neuron.

[24]  Bruno B Averbeck,et al.  Integration of Auditory and Visual Communication Information in the Primate Ventrolateral Prefrontal Cortex , 2006, The Journal of Neuroscience.

[25]  Christoph Kayser,et al.  Spatial Organization of Multisensory Responses in Temporal Association Cortex , 2009, The Journal of Neuroscience.

[26]  K. Saberi,et al.  Cognitive restoration of reversed speech , 1999, Nature.

[27]  P. Barber,et al.  Effect of video frame rate on subjects' ability to shadow one of two competing verbal passages. , 1994, Journal of speech and hearing research.

[28]  Eric Vatikiotis-Bateson,et al.  The moving face during speech communication , 1998 .

[29]  Asif A. Ghazanfar,et al.  The Natural Statistics of Audiovisual Speech , 2009, PLoS Comput. Biol..

[30]  R. Desimone,et al.  Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. , 1981, Journal of neurophysiology.

[31]  M. Hasselmo,et al.  The role of expression and identity in the face-selective responses of neurons in the temporal visual cortex of the monkey , 1989, Behavioural Brain Research.

[32]  Aina Puce,et al.  Magnetoencephalographic study of occipitotemporal activity elicited by viewing mouth movements , 2004, Clinical Neurophysiology.

[33]  L. Parr,et al.  Mapping the contribution of single muscles to facial movements in the rhesus macaque , 2008, Physiology & Behavior.

[34]  Karl Zilles,et al.  Cortical Orofacial Motor Representation in Old World Monkeys, Great Apes, and Humans , 2004, Brain, Behavior and Evolution.

[35]  Richard S. J. Frackowiak,et al.  Endogenous Cortical Rhythms Determine Cerebral Specialization for Speech Perception and Production , 2007, Neuron.

[36]  Jeesun Kim,et al.  Investigating the audio-visual speech detection advantage , 2004, Speech Commun..

[37]  Hisao Nishijo,et al.  Differential characteristics of face neuron responses within the anterior superior temporal sulcus of macaques. , 2005, Journal of neurophysiology.

[38]  Jeffery A. Jones,et al.  Neural processes underlying perceptual enhancement by visual speech gestures , 2003, Neuroreport.

[39]  Ruth Campbell,et al.  The processing of audio-visual speech: empirical and neural bases , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[40]  Bryan E. Pfingst,et al.  A vertical stereotaxic approach to auditory cortex in the unanesthetized monkey , 1980, Journal of Neuroscience Methods.

[41]  L. Rosenblum Primate Behavior: Developments in Field and Laboratory Research , 1970 .

[42]  John J. Ohala,et al.  The Temporal Regulation of Speech , 1975 .

[43]  Daeyeol Lee,et al.  Analysis of phase-locked oscillations in multi-channel single-unit spike activity with wavelet cross-spectrum , 2002, Journal of Neuroscience Methods.

[44]  Aina Puce,et al.  Neural responses elicited to face motion and vocalization pairings , 2007, Neuropsychologia.

[45]  G. A. Calvert,et al.  Auditory-visual processing represented in the human superior temporal gyrus , 2007, Neuroscience.

[46]  John J. Foxe,et al.  Human–simian correspondence in the early cortical processing of multisensory cues , 2004, Cognitive Processing.

[47]  Philip Lieberman,et al.  Speech Physiology, Speech Perception, and Acoustic Phonetics , 1988 .

[48]  C. Gross,et al.  Representations of faces and body parts in macaque temporal cortex: a functional MRI study. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Steven Greenberg,et al.  Temporal properties of spontaneous speech - a syllable-centric perspective , 2003, J. Phonetics.

[50]  L. Parr,et al.  Facial musculature in the rhesus macaque (Macaca mulatta): evolutionary and functional contexts with comparisons to chimpanzees and humans , 2009, Journal of anatomy.

[51]  R. Paget The Origin of Speech , 1927, Nature.

[52]  R. Campbell,et al.  Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex , 2000, Current Biology.

[53]  Hisao Nishijo,et al.  Neuronal correlates of face identification in the monkey anterior temporal cortical areas. , 2004, Journal of neurophysiology.

[54]  Doris Y. Tsao,et al.  A face feature space in the macaque temporal lobe , 2009, Nature Neuroscience.

[55]  J. Kaas,et al.  Subdivisions of auditory cortex and ipsilateral cortical connections of the parabelt auditory cortex in macaque monkeys , 1998, The Journal of comparative neurology.

[56]  Asif A Ghazanfar,et al.  Different neural frequency bands integrate faces and voices differently in the superior temporal sulcus. , 2009, Journal of neurophysiology.

[57]  Gregory McCarthy,et al.  Polysensory interactions along lateral temporal regions evoked by audiovisual speech. , 2003, Cerebral cortex.

[58]  D. Perrett,et al.  Visual neurones responsive to faces in the monkey temporal cortex , 2004, Experimental Brain Research.

[59]  K. Zilles,et al.  Evolution of the brainstem orofacial motor system in primates: a comparative study of trigeminal, facial, and hypoglossal nuclei. , 2005, Journal of human evolution.

[60]  Joost X. Maier,et al.  Multisensory Integration of Dynamic Faces and Voices in Rhesus Monkey Auditory Cortex , 2005 .

[61]  Umberto Castiello,et al.  The human temporal lobe integrates facial form and motion: evidence from fMRI and ERP studies , 2003, NeuroImage.

[62]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[63]  Asif A Ghazanfar,et al.  Interactions between the Superior Temporal Sulcus and Auditory Cortex Mediate Dynamic Face/Voice Integration in Rhesus Monkeys , 2008, The Journal of Neuroscience.

[64]  C. Gross,et al.  Neural representations of faces and body parts in macaque and human cortex: a comparative FMRI study. , 2009, Journal of neurophysiology.

[65]  S. Pinker,et al.  Natural language and natural selection , 1990, Behavioral and Brain Sciences.

[66]  A. Szűcs,et al.  Applications of the spike density function in analysis of neuronal firing patterns , 1998, Journal of Neuroscience Methods.