Norm-Based Coding of Voice Identity in Human Auditory Cortex

Summary Listeners exploit small interindividual variations around a generic acoustical structure to discriminate and identify individuals from their voice—a key requirement for social interactions. The human brain contains temporal voice areas (TVA) [1] involved in an acoustic-based representation of voice identity [2, 3, 4, 5, 6], but the underlying coding mechanisms remain unknown. Indirect evidence suggests that identity representation in these areas could rely on a norm-based coding mechanism [4, 7, 8, 9, 10, 11]. Here, we show by using fMRI that voice identity is coded in the TVA as a function of acoustical distance to two internal voice prototypes (one male, one female)—approximated here by averaging a large number of same-gender voices by using morphing [12]. Voices more distant from their prototype are perceived as more distinctive and elicit greater neuronal activity in voice-sensitive cortex than closer voices—a phenomenon not merely explained by neuronal adaptation [13, 14]. Moreover, explicit manipulations of distance-to-mean by morphing voices toward (or away from) their prototype elicit reduced (or enhanced) neuronal activity. These results indicate that voice-sensitive cortex integrates relevant acoustical features into a complex representation referenced to idealized male and female voice prototypes. More generally, they shed light on remarkable similarities in cerebral representations of facial and vocal identity.

[1]  H. Wilson,et al.  fMRI evidence for the neural representation of faces , 2005, Nature Neuroscience.

[2]  K. Grill-Spector,et al.  Repetition and the brain: neural models of stimulus-specific effects , 2006, Trends in Cognitive Sciences.

[3]  Guillaume A. Rousselet,et al.  Improving standards in brain-behavior correlation analyses , 2012, Front. Hum. Neurosci..

[4]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[5]  I. Titze Physiologic and acoustic differences between male and female voices. , 1989, The Journal of the Acoustical Society of America.

[6]  T. Valentine The Quarterly Journal of Experimental Psychology Section A: Human Experimental Psychology a Unified Account of the Effects of Distinctiveness, Inversion, and Race in Face Recognition , 2022 .

[7]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Julia Kastner,et al.  Introduction to Robust Estimation and Hypothesis Testing , 2005 .

[9]  Rainer Goebel,et al.  "Who" Is Saying "What"? Brain-Based Decoding of Human Voice and Speech , 2008, Science.

[10]  N. Logothetis The neural basis of the blood-oxygen-level-dependent functional magnetic resonance imaging signal. , 2002, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[11]  P. Belin,et al.  Anti-Voice Adaptation Suggests Prototype-Based Coding of Voice Identity , 2011, Front. Psychology.

[12]  D. Lancker,et al.  Voice discrimination and recognition are separate abilities , 1987, Neuropsychologia.

[13]  Hideki Kawahara,et al.  Auditory Adaptation in Voice Perception , 2008, Current Biology.

[14]  David Alexander Kahn,et al.  Confounding of norm-based and adaptation effects in brain responses , 2012, NeuroImage.

[15]  Jason D. Warren,et al.  Developmental phonagnosia: A selective deficit of vocal identity recognition , 2009, Neuropsychologia.

[16]  M. Giese,et al.  Norm-based face encoding by single neurons in the monkey inferotemporal cortex , 2006, Nature.

[17]  Hideki Kawahara,et al.  Vocal Attractiveness Increases by Averaging , 2010, Current Biology.

[18]  Nicolas Davidenko,et al.  Face‐likeness and image variability drive responses in human face‐selective ventral regions , 2012, Human brain mapping.

[19]  Oliver Watts,et al.  Roles of the average voice in speaker-adaptive HMM-based speech synthesis , 2010, INTERSPEECH.

[20]  R. Zatorre,et al.  Voice-selective areas in human auditory cortex , 2000, Nature.

[21]  A. O'Toole,et al.  Prototype-referenced shape encoding revealed by high-level aftereffects , 2001, Nature Neuroscience.

[22]  A. Kleinschmidt,et al.  Modulation of neural responses to speech by directing attention to voices or verbal content. , 2003, Brain research. Cognitive brain research.

[23]  G Papcun,et al.  Long-term memory for unfamiliar voices. , 1989, The Journal of the Acoustical Society of America.

[24]  Andrew J. Edmonds,et al.  Familiar and unfamiliar face recognition: A review , 2009, Memory.

[25]  Hideki Kawahara,et al.  Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[26]  Doris Y. Tsao,et al.  What's so special about the average face? , 2006, Trends in Cognitive Sciences.

[27]  Geoffrey Karl Aguirre,et al.  Continuous carry-over designs for fMRI , 2007, NeuroImage.

[28]  Yukiko Kikuchi,et al.  Hierarchical Auditory Processing Directed Rostrally along the Monkey's Supratemporal Plane , 2010, The Journal of Neuroscience.

[29]  Roy D Patterson,et al.  The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex, and age. , 2005, The Journal of the Acoustical Society of America.

[30]  J. Liénard,et al.  Women use voice parameters to assess men's characteristics , 2006, Proceedings of the Royal Society B: Biological Sciences.

[31]  K. Scherer,et al.  Mapping emotions into acoustic space: The role of voice production , 2011, Biological Psychology.

[32]  Pascal Belin,et al.  Cerebral Processing of Voice Gender Studied Using a Continuous Carryover fMRI Design , 2012, Cerebral cortex.

[33]  P. Belin,et al.  Understanding voice perception. , 2011, British journal of psychology.

[34]  William J. Talkington,et al.  Human Cortical Organization for Processing Vocalizations Indicates Representation of Harmonic Structure as a Signal Attribute , 2009, The Journal of Neuroscience.

[35]  Pascal Belin,et al.  Perceptual scaling of voice identity: common dimensions for different vowels and speakers , 2010, Psychological research.

[36]  W. Fitch Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. , 1997, The Journal of the Acoustical Society of America.

[37]  J. Kreiman,et al.  Perceptual evaluation of voice quality: review, tutorial, and a framework for future research. , 1993, Journal of speech and hearing research.

[38]  N. Logothetis,et al.  Neural basis of the blood-oxygen-level-dependent functional magnetic resonance imaging , 2004 .

[39]  Pascal Belin,et al.  Learning-induced changes in the cerebral processing of voice identity. , 2011, Cerebral cortex.

[40]  Pascal Belin,et al.  Implicitly perceived vocal attractiveness modulates prefrontal cortex activity. , 2012, Cerebral cortex.

[41]  Johan Wagemans,et al.  Dynamic Norm-based Encoding for Unfamiliar Shapes in Human Visual Cortex , 2011, Journal of Cognitive Neuroscience.

[42]  Jody Kreiman,et al.  Foundations of Voice Studies: An Interdisciplinary Approach to Voice Production and Perception , 2011 .

[43]  S. Campanella,et al.  Integrating face and voice in person perception , 2007, Trends in Cognitive Sciences.

[44]  James M. McQueen,et al.  Neural mechanisms for voice recognition , 2010, NeuroImage.

[45]  J. Rauschecker,et al.  Mechanisms and streams for processing of "what" and "where" in auditory cortex. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Yizhar Lavner,et al.  The Prototype Model in Speaker Identification by Human Listeners , 2001, Int. J. Speech Technol..

[47]  J. Hillenbrand,et al.  Acoustic characteristics of American English vowels. , 1994, The Journal of the Acoustical Society of America.

[48]  Thomas E. Nichols,et al.  Nonparametric permutation tests for functional neuroimaging: A primer with examples , 2002, Human brain mapping.