Causal inference in environmental sound recognition

Sound is caused by physical events in the world. Do humans infer these causes when recognizing sound sources? We tested whether the recognition of common environmental sounds depends on the inference of a basic physical variable – the source intensity (i.e. the power that produces a sound). A source’s intensity can be inferred from the intensity it produces at the ear and its distance, which is normally conveyed by reverberation. Listeners could thus use intensity at the ear and reverberation to constrain recognition by inferring the underlying source intensity. Alternatively, listeners might separate these acoustic cues from their representation of a sound’s identity in the interest of invariant recognition. We compared these two hypotheses by measuring recognition accuracy for sounds with typically low or high source intensity (e.g. pepper grinders vs. trucks) that were presented across a range of intensities at the ear or with reverberation cues to distance. The recognition of low-intensity sources (e.g. pepper grinders) was impaired by high presentation intensities or reverberation that conveyed distance, either of which imply high source intensity. Neither effect occurred for high-intensity sources. The results suggest that listeners implicitly use the intensity at the ear along with distance cues to infer a source’s power and constrain its identity. The recognition of real-world sounds thus appears to depend upon the inference of their physical generative parameters, even generative parameters whose cues might otherwise be separated from the representation of a sound’s identity.

[1]  J. Hillenbrand,et al.  Acoustic characteristics of American English vowels. , 1994, The Journal of the Acoustical Society of America.

[2]  Jonathan Z. Simon,et al.  Adaptive Temporal Encoding Leads to a Background-Insensitive Cortical Representation of Speech , 2013, The Journal of Neuroscience.

[3]  B. Moore,et al.  A Test for the Diagnosis of Dead Regions in the Cochlea , 2000, British journal of audiology.

[4]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[5]  Josh H McDermott,et al.  PSYCHOLOGICAL SCIENCE Research Article Is Relative Pitch Specific to Pitch? , 2022 .

[6]  F. Attneave,et al.  Pitch as a medium: a new approach to psychophysical scaling. , 1971, The American journal of psychology.

[7]  James Traer,et al.  A PERCEPTUALLY INSPIRED GENERATIVE MODEL OF RIGID-BODY CONTACT SOUNDS , 2019 .

[8]  W. Dowling,et al.  Contour, interval, and pitch recognition in memory for melodies. , 1971, The Journal of the Acoustical Society of America.

[9]  Davide Rocchesso,et al.  Sounding Objects , 2003, IEEE Multim..

[10]  Guillaume Lemaitre,et al.  Listener expertise and sound identification influence the categorization of environmental sounds. , 2010, Journal of experimental psychology. Applied.

[11]  A. Yuille,et al.  Object perception as Bayesian inference. , 2004, Annual review of psychology.

[12]  Stephen McAdams,et al.  Spectral and temporal cues for perception of material and action categories in impacted sound sources. , 2016, The Journal of the Acoustical Society of America.

[13]  Bahar Khalighinejad,et al.  Adaptation of the human auditory cortex to changing background noise , 2019, Nature Communications.

[14]  Josh H McDermott,et al.  Statistics of natural reverberation enable perceptual separation of sound and space , 2016, Proceedings of the National Academy of Sciences.

[15]  Sophie K. Scott,et al.  The neural processing of masked speech , 2013, Hearing Research.

[16]  Josh H McDermott,et al.  Schema learning for the cocktail party problem , 2018, Proceedings of the National Academy of Sciences.

[17]  Richard Kronland-Martinet,et al.  An Intuitive Synthesizer of Continuous-Interaction Sounds: Rubbing, Scratching, and Rolling , 2014, Computer Music Journal.

[18]  Josh H. McDermott,et al.  Illusory sound texture reveals multi-second statistical completion in auditory scene analysis , 2019, Nature Communications.

[19]  Brian C J Moore,et al.  Prediction of absolute thresholds and equal-loudness contours using a modified loudness model. , 2006, The Journal of the Acoustical Society of America.

[20]  Xiaoqin Wang,et al.  Level Invariant Representation of Sounds by Populations of Neurons in Primary Auditory Cortex , 2008, The Journal of Neuroscience.

[21]  Patrick Susini,et al.  Identification of categories of liquid sounds. , 2017, The Journal of the Acoustical Society of America.

[22]  M. Liberman,et al.  Cochlear efferent feedback balances interaural sensitivity , 2006, Nature Neuroscience.

[23]  Guillaume Lemaitre,et al.  Looking at the world with your ears: how do we get the size of an object from its sound? , 2013, Acta psychologica.

[24]  J. Ballas Common factors in the identification of an assortment of brief everyday sounds , 1993 .

[25]  N. C. Singh,et al.  Modulation spectra of natural sounds and ethological theories of auditory processing. , 2003, The Journal of the Acoustical Society of America.

[26]  Powen Ru,et al.  Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.

[27]  R. H. Lyon,et al.  Sound propagation in urban areas , 1974 .

[28]  Valeriy Shafiro,et al.  Development of a Large-Item Environmental Sound Test and the Effects of Short-Term Training with Spectrally-Degraded Stimuli , 2008, Ear and hearing.

[29]  Guillaume Lemaitre,et al.  Auditory perception of material is fragile while action is strikingly robust. , 2012, The Journal of the Acoustical Society of America.

[30]  C. Atencio,et al.  Hierarchical representations in the auditory cortex , 2011, Current Opinion in Neurobiology.

[31]  Neil C. Rabinowitz,et al.  Constructing Noise-Invariant Representations of Sound in the Auditory Pathway , 2013, PLoS biology.

[32]  G. Lemaitre,et al.  Evidence for a basic level in a taxonomy of everyday action sounds , 2013, Experimental Brain Research.

[33]  Brian Gygi,et al.  The incongruency advantage for environmental sounds presented in natural auditory scenes. , 2011, Journal of experimental psychology. Human perception and performance.

[34]  Brian Gygi,et al.  Informational factors in identifying environmental sounds in natural auditory scenes. , 2009, The Journal of the Acoustical Society of America.

[35]  Brian Gygi,et al.  How to select stimuli for environmental sound research and where to find them , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[36]  Benjamin J. Kraus,et al.  Invariance and Sensitivity to Intensity in Neural Discrimination of Natural Sounds , 2008, The Journal of Neuroscience.

[37]  Josh H. McDermott,et al.  Efficient codes for memory determine pitch representations , 2020 .

[38]  A. Bronkhorst,et al.  Auditory distance perception in humans : A summary of past and present research , 2005 .

[39]  David DeSteno,et al.  Individual talker differences in voice-onset-time. , 2003, The Journal of the Acoustical Society of America.

[40]  William W. Gaver What in the World Do We Hear? An Ecological Approach to Auditory Event Perception , 1993 .

[41]  M. Ernst,et al.  Experience can change the 'light-from-above' prior , 2004, Nature Neuroscience.

[42]  Tammo Houtgast,et al.  Auditory distance perception in rooms , 1999, Nature.

[43]  Joshua B. Tenenbaum,et al.  Noisy Newtons: Unifying process and dependency accounts of causal attribution , 2012, CogSci.

[44]  M. Padgham Reverberation and frequency attenuation in forests--implications for acoustic communication in animals. , 2004, The Journal of the Acoustical Society of America.

[45]  Brian Gygi,et al.  Similarity and categorization of environmental sounds , 2007, Perception & psychophysics.

[46]  J. Guinan Olivocochlear Efferents: Anatomy, Physiology, Function, and the Measurement of Efferent Effects in Humans , 2006, Ear and hearing.

[47]  Eero P. Simoncelli,et al.  Summary statistics in auditory perception , 2013, Nature Neuroscience.

[48]  Josh H McDermott,et al.  Headphone screening to facilitate web-based auditory experiments , 2017, Attention, Perception, & Psychophysics.

[49]  Xiaoqin Wang,et al.  Optimal features for auditory categorization , 2018, Nature Communications.

[50]  Frédéric E. Theunissen,et al.  Noise-invariant Neurons in the Avian Auditory Cortex: Hearing the Song in Noise , 2013, PLoS Comput. Biol..

[51]  Isaac M. Carruthers,et al.  Emergence of invariant representation of vocalizations in the auditory cortex. , 2015, Journal of neurophysiology.

[52]  Tadasu Oyama,et al.  Perceived size and perceived distance in stereoscopic vision and an analysis of their causal relations , 1974 .

[53]  I. Dean,et al.  Neural population coding of sound level adapts to stimulus statistics , 2005, Nature Neuroscience.

[54]  J. Ballas Common factors in the identification of an assortment of brief everyday sounds. , 1993, Journal of experimental psychology. Human perception and performance.

[55]  David D. Cox,et al.  Untangling invariant object recognition , 2007, Trends in Cognitive Sciences.

[56]  Pavel Zahorik,et al.  Loudness constancy with varying sound source distance , 2001, Nature Neuroscience.

[57]  D H Mershon,et al.  Absolute and Relative Cues for the Auditory Perception of Egocentric Distance , 1979, Perception.

[58]  Eero P. Simoncelli,et al.  Article Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis , 2022 .

[59]  Brian Gygi,et al.  Spectral-temporal factors in the identification of environmental sounds. , 2004, The Journal of the Acoustical Society of America.

[60]  M. Grassi Do we hear size or sound? Balls dropped on plates , 2005, Perception & psychophysics.

[61]  Josh H McDermott,et al.  Invariance to background noise as a signature of non-primary auditory cortex , 2019, Nature Communications.

[62]  Nori Jacoby,et al.  Perceptual fusion of musical notes by native Amazonians suggests universal representations of musical intervals , 2020, Nature Communications.

[63]  Guillaume Lemaitre,et al.  Is loudness part of a sound recognition process? , 2019, The Journal of the Acoustical Society of America.

[64]  Josh H. McDermott,et al.  Adaptive and Selective Time Averaging of Auditory Scenes , 2018, Current Biology.

[65]  Nicole C. Rust,et al.  Selectivity and Tolerance (“Invariance”) Both Increase as Visual Information Propagates from Cortical Area V4 to IT , 2010, The Journal of Neuroscience.

[66]  Stephen V. David,et al.  Mechanisms of noise robust representation of speech in primary auditory cortex , 2014, Proceedings of the National Academy of Sciences.

[67]  Ulrik R. Beierholm,et al.  Causal inference in perception , 2010, Trends in Cognitive Sciences.

[68]  Bruno L. Giordano Everyday listening: an annotated bibliography , 2003 .

[69]  Josh H. McDermott,et al.  Distinct Cortical Pathways for Music and Speech Revealed by Hypothesis-Free Voxel Decomposition , 2015, Neuron.

[70]  L. Holt Speech categorization in context: joint effects of nonspeech and speech precursors. , 2006, The Journal of the Acoustical Society of America.

[71]  L. Holt,et al.  Nevertheless, it persists: Dimension-based statistical learning and normalization of speech impact different levels of perceptual processing , 2020, Cognition.