Acoustic Correlates of Auditory Object and Event Perception: Speakers, Musical Timbres, and Environmental Sounds

Human listeners must identify and orient themselves to auditory objects and events in their environment. What acoustic features support a listener’s ability to differentiate the great variety of natural sounds they might encounter? Studies of auditory object perception typically examine identification (and confusion) responses or dissimilarity ratings between pairs of objects and events. However, the majority of this prior work has been conducted within single categories of sound. This separation has precluded a broader understanding of the general acoustic attributes that govern auditory object and event perception within and across different behaviorally relevant sound classes. The present experiments take a broader approach by examining multiple categories of sound relative to one another. This approach bridges critical gaps in the literature and allows us to identify (and assess the relative importance of) features that are useful for distinguishing sounds within, between and across behaviorally relevant sound categories. To do this, we conducted behavioral sound identification (Experiment 1) and dissimilarity rating (Experiment 2) studies using a broad set of stimuli that leveraged the acoustic variability within and between different sound categories via a diverse set of 36 sound tokens (12 utterances from different speakers, 12 instrument timbres, and 12 everyday objects from a typical human environment). Multidimensional scaling solutions as well as analyses of item-pair-level responses as a function of different acoustic qualities were used to understand what acoustic features informed participants’ responses. In addition to the spectral and temporal envelope qualities noted in previous work, listeners’ dissimilarity ratings were associated with spectrotemporal variability and aperiodicity. Subsets of these features (along with fundamental frequency variability) were also useful for making specific within or between sound category judgments. Dissimilarity ratings largely paralleled sound identification performance, however the results of these tasks did not completely mirror one another. In addition, musical training was related to improved sound identification performance.

[1]  G. Lemaitre,et al.  Evidence for a basic level in a taxonomy of everyday action sounds , 2013, Experimental Brain Research.

[2]  W. V. Dommelen Acoustic parameters in human speaker recognition. , 1990 .

[3]  J. Kreiman,et al.  Individual differences in voice quality perception. , 1992 .

[4]  James A. Hampton,et al.  Similarity and Categorization , 2001 .

[5]  Emine Merve Kaya,et al.  Investigating bottom-up auditory attention , 2014, Front. Hum. Neurosci..

[6]  J. F. Corso,et al.  Timbre Cues and the Identification of Musical Instruments , 1962 .

[7]  N. C. Singh,et al.  Modulation spectra of natural sounds and ethological theories of auditory processing. , 2003, The Journal of the Acoustical Society of America.

[8]  J. Bachorowski,et al.  Acoustic correlates of talker sex and individual talker identity are present in a short vowel segment produced in running speech. , 1999, The Journal of the Acoustical Society of America.

[9]  R N Shepard,et al.  Multidimensional Scaling, Tree-Fitting, and Clustering , 1980, Science.

[10]  J. Grey Multidimensional perceptual scaling of musical timbres. , 1977, The Journal of the Acoustical Society of America.

[11]  Frédéric E. Theunissen,et al.  The Modulation Transfer Function for Speech Intelligibility , 2009, PLoS Comput. Biol..

[12]  Virginia M Richards,et al.  Auditory "bubbles": Efficient classification of the spectrotemporal modulations essential for speech intelligibility. , 2016, Journal of the Acoustical Society of America.

[13]  Rainer Goebel,et al.  "Who" Is Saying "What"? Brain-Based Decoding of Human Voice and Speech , 2008, Science.

[14]  B. Moore,et al.  Pitch discrimination interference: the role of ear of entry and of octave similarity. , 2009, The Journal of the Acoustical Society of America.

[15]  Guillaume Lemaitre,et al.  Auditory perception of material is fragile while action is strikingly robust. , 2012, The Journal of the Acoustical Society of America.

[16]  P. Belin,et al.  Superior voice timbre processing in musicians , 2006, Neuroscience Letters.

[17]  L. Wedin,et al.  Dimension analysis of the perception of instrumental timbre. , 1972, Scandinavian journal of psychology.

[18]  Patrick Mair,et al.  Multidimensional Scaling Using Majorization: SMACOF in R , 2008 .

[19]  Stefanie E. Kuchinsky,et al.  Separable neural representations of sound sources: Speaker identity and musical timbre , 2019, NeuroImage.

[20]  Roy D. Patterson,et al.  The stimulus duration required to identify vowels, their octave, and their pitch chroma , 1995 .

[21]  Matthias J. Sjerps,et al.  Speaker Normalization in Speech Perception , 2008, The Handbook of Speech Perception.

[22]  S. Handel,et al.  Chapter 12 – Timbre Perception and Auditory Object Identification , 1995 .

[23]  Julie E. Elie,et al.  Neural processing of natural sounds , 2014, Nature Reviews Neuroscience.

[24]  V C Tartter,et al.  Identifiability of vowels and speakers from whispered syllables , 1991, Perception & psychophysics.

[25]  C. Krumhansl,et al.  Isolating the dynamic attributes of musical timbre. , 1993, The Journal of the Acoustical Society of America.

[26]  S. Lakatos A common perceptual space for harmonic and percussive timbres , 2000, Perception & psychophysics.

[27]  William W. Gaver What in the World Do We Hear? An Ecological Approach to Auditory Event Perception , 1993 .

[28]  Stephen McAdams,et al.  Sound Source Mechanics and Musical Timbre Perception: Evidence From Previous Studies , 2010 .

[29]  E. Schellenberg,et al.  Long-Term Positive Associations between Music Lessons and IQ. , 2006 .

[30]  Mounya Elhilali,et al.  Auditory salience using natural soundscapes , 2017, The Journal of the Acoustical Society of America.

[31]  Philippe Depalle,et al.  Perceptually salient spectrotemporal modulations for recognition of sustained musical instruments. , 2016, The Journal of the Acoustical Society of America.

[32]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[33]  N. Kriegeskorte,et al.  Author ' s personal copy Representational geometry : integrating cognition , computation , and the brain , 2013 .

[34]  Clara Suied,et al.  Auditory gist: recognition of very short sounds from timbre cues. , 2014, The Journal of the Acoustical Society of America.

[35]  T Murry,et al.  Multidimensional analysis of male and female voices. , 1980, The Journal of the Acoustical Society of America.

[36]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[37]  Jonathan W. Peirce,et al.  PsychoPy—Psychophysics software in Python , 2007, Journal of Neuroscience Methods.

[38]  Sarah C. Creel,et al.  How Talker Identity Relates to Language Processing , 2011, Lang. Linguistics Compass.

[39]  Stephen McAdams,et al.  Four Distinctions for the Auditory “Wastebasket” of Timbre1 , 2017, Front. Psychol..

[40]  Giles Wilkeson Gray Phonemic microtomy: The minimum duration of perceptible speech sounds , 1942 .

[41]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[42]  S. Hiki,et al.  Multidimensional representation of personal quality of vowels and its acoustical correlates , 1973 .

[43]  Stephen McAdams,et al.  Spectral and temporal cues for perception of material and action categories in impacted sound sources. , 2016, The Journal of the Acoustical Society of America.

[45]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[46]  P Iverson,et al.  Mapping the perceptual magnet effect for speech using signal detection theory and multidimensional scaling. , 1995, The Journal of the Acoustical Society of America.

[47]  Stephen McAdams,et al.  Perceptually Salient Regions of the Modulation Power Spectrum for Musical Instrument Identification , 2017, Front. Psychol..

[48]  Stephen McAdams,et al.  The Perception of Musical Timbre , 2008 .

[49]  Liberty S. Hamilton,et al.  Acoustic structure of the five perceptual dimensions of timbre in orchestral instrument tones. , 2013, The Journal of the Acoustical Society of America.

[50]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[51]  Clara Suied,et al.  Fast recognition of musical sounds based on timbre. , 2012, The Journal of the Acoustical Society of America.

[52]  Pascal Belin,et al.  Perceptual scaling of voice identity: common dimensions for different vowels and speakers , 2010, Psychological research.

[53]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[54]  J E Flege,et al.  The perception of English and Spanish vowels by native English and Spanish listeners: a multidimensional scaling analysis. , 1995, The Journal of the Acoustical Society of America.

[55]  Mattson Ogg,et al.  The time course of sound category identification: Insights from acoustic features. , 2017, The Journal of the Acoustical Society of America.

[56]  Roy D Patterson,et al.  The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex, and age. , 2005, The Journal of the Acoustical Society of America.

[57]  P. Sham,et al.  A note on the calculation of empirical P values from Monte Carlo procedures. , 2002, American journal of human genetics.

[58]  Mounya Elhilali,et al.  Music in Our Ears: The Biological Bases of Musical Timbre Perception , 2012, PLoS Comput. Biol..

[59]  Brian Gygi,et al.  Spectral-temporal factors in the identification of environmental sounds. , 2004, The Journal of the Acoustical Society of America.

[60]  J. W. Gordon,et al.  Perceptual effects of spectral modifications on musical timbres , 1978 .

[61]  R. C. Oldfield The assessment and analysis of handedness: the Edinburgh inventory. , 1971, Neuropsychologia.

[62]  Y. Cohen,et al.  The what, where and how of auditory-object perception , 2013, Nature Reviews Neuroscience.

[63]  Stephen McAdams,et al.  Comparison of Methods for Collecting and Modeling Dissimilarity Data: Applications to Complex Sound Stimuli , 2011, Multivariate behavioral research.

[64]  Patrick Susini,et al.  The Timbre Toolbox: extracting audio descriptors from musical signals. , 2011, The Journal of the Acoustical Society of America.

[65]  J. Hillenbrand,et al.  Acoustic characteristics of American English vowels. , 1994, The Journal of the Acoustical Society of America.

[66]  J. Rauschecker,et al.  Cortical Representation of Natural Complex Sounds: Effects of Acoustic Features and Auditory Object Category , 2010, The Journal of Neuroscience.

[67]  J. Ballas Common factors in the identification of an assortment of brief everyday sounds. , 1993, Journal of experimental psychology. Human perception and performance.

[68]  W. V. van Dommelen,et al.  Acoustic Parameters in Human Speaker Recognition , 1990, Language and speech.

[69]  Daniel Müllensiefen,et al.  The Musicality of Non-Musicians: An Index for Assessing Musical Sophistication in the General Population , 2014, PloS one.

[70]  Brian Gygi,et al.  Similarity and categorization of environmental sounds , 2007, Perception & psychophysics.

[71]  G. Soete,et al.  Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes , 1995, Psychological research.

[72]  Patrick C. M. Wong,et al.  Learning to recognize speakers of a non-native language: Implications for the functional organization of human auditory cortex , 2007, Neuropsychologia.

[73]  E. Schellenberg,et al.  Music Training, Cognition, and Personality , 2013, Front. Psychol..

[74]  B. Moore An introduction to the psychology of hearing, 3rd ed. , 1989 .

[75]  Powen Ru,et al.  Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.

[76]  Mounya Elhilali,et al.  Perceptual susceptibility to acoustic manipulations in speaker discrimination. , 2015, The Journal of the Acoustical Society of America.

[77]  Stephen McAdams,et al.  Acoustic and Categorical Dissimilarity of Musical Timbre: Evidence from Asymmetries Between Acoustic and Chimeric Sounds , 2016, Front. Psychol..

[78]  S. McAdams,et al.  Acoustic correlates of timbre space dimensions: a confirmatory study using synthetic tones. , 2005, The Journal of the Acoustical Society of America.

[79]  Stephen McAdams,et al.  Recognition of sound sources and events , 1993 .

[80]  Barbara Tillmann,et al.  Categorization of Extremely Brief Auditory Stimuli: Domain-Specific or Domain-General Processes? , 2011, PloS one.

[81]  Stephen McAdams,et al.  A Comparison of Approaches to Timbre Descriptors in Music Information Retrieval and Music Psychology , 2016 .

[82]  Mattson Ogg,et al.  Neural Mechanisms of Music and Language , 2019, The Oxford Handbook of Neurolinguistics.

[83]  Michael J Owren,et al.  The relative roles of vowels and consonants in discriminating talker identity versus word meaning. , 2006, The Journal of the Acoustical Society of America.

[84]  Stephen McAdams,et al.  Hearing living symbols and nonliving icons: Category specificities in the cognitive processing of environmental sounds , 2010, Brain and Cognition.