Ecological origins of perceptual grouping principles in the auditory system

Events and objects in the world must be inferred from sensory signals to support behavior. Because sensory measurements are temporally and spatially local, the estimation of an object or event can be viewed as the grouping of these measurements into representations of their common causes. Per-ceptual grouping is believed to reflect internalized regularities of the natural environment, yet grouping cues have traditionally been identified using informal observation, and investigated using artificial stim-uli. The relationship of grouping to natural signal statistics has thus remained unclear, and additional or alternative cues remain possible. Here we derive auditory grouping cues by measuring and summarizing statistics of natural sound features. Feature co-occurrence statistics reproduced established cues but also revealed previously unappreciated grouping principles. The results suggest that auditory grouping is adapted to natural stimulus statistics, show how these statistics can reveal novel grouping phenomena, and provide a framework for studying grouping in natural signals.

[1]  R. Carlyon How the brain separates sounds , 2004, Trends in Cognitive Sciences.

[2]  David J. Field,et al.  Contour integration by the human visual system: Evidence for a local “association field” , 1993, Vision Research.

[3]  Nicole L. Carlson,et al.  Sparse Codes for Speech Predict Spectrotemporal Receptive Fields in the Inferior Colliculus , 2012, PLoS Comput. Biol..

[4]  C. Darwin Perceiving vowels in the presence of another sound: constraints on formant perception. , 1984, The Journal of the Acoustical Society of America.

[5]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.

[6]  F. Attneave Some informational aspects of visual perception. , 1954, Psychological review.

[7]  C. Darwin,et al.  Perceptual separation of simultaneous vowels: within and across-formant grouping by F0. , 1993, The Journal of the Acoustical Society of America.

[8]  Josh H McDermott,et al.  Schema learning for the cocktail party problem , 2018, Proceedings of the National Academy of Sciences.

[9]  Aapo Hyvärinen,et al.  Natural Image Statistics - A Probabilistic Approach to Early Computational Vision , 2009, Computational Imaging and Vision.

[10]  S A Shamma,et al.  Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. , 2001, Journal of neurophysiology.

[11]  J. Culling,et al.  Perceptual separation of concurrent speech sounds: absence of across-frequency grouping by common interaural delay. , 1995, The Journal of the Acoustical Society of America.

[12]  Jacob feldman,et al.  Bayesian contour integration , 2001, Perception & psychophysics.

[13]  Mounya Elhilali,et al.  Segregating Complex Sound Sources through Temporal Coherence , 2014, PLoS Comput. Biol..

[14]  C. Gilbert,et al.  On a common circle: natural scenes and Gestalt rules. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  S. Shamma,et al.  Temporal coherence and attention in auditory scene analysis , 2011, Trends in Neurosciences.

[16]  E. Brunswik,et al.  Ecological cue-validity of proximity and of other Gestalt factors. , 1953, The American journal of psychology.

[17]  R. Carlyon,et al.  Discriminating between coherent and incoherent frequency modulation of complex tones. , 1991, The Journal of the Acoustical Society of America.

[18]  Eero P. Simoncelli,et al.  Summary statistics in auditory perception , 2013, Nature Neuroscience.

[19]  Mike E. Davies,et al.  Sparse and shift-Invariant representations of music , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  S. Schwerman,et al.  The Physics of Musical Instruments , 1991 .

[21]  C. Atencio,et al.  Hierarchical computation in the canonical auditory cortical circuit , 2009, Proceedings of the National Academy of Sciences.

[22]  Josh H McDermott,et al.  Statistics of natural reverberation enable perceptual separation of sound and space , 2016, Proceedings of the National Academy of Sciences.

[23]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[24]  Yoshitaka Nakajima,et al.  Auditory Scene Analysis: The Perceptual Organization of Sound Albert S. Bregman , 1992 .

[25]  Wilson S. Geisler,et al.  Optimal speed estimation in natural image movies predicts human performance , 2015, Nature Communications.

[26]  M. Wertheimer Untersuchungen zur Lehre von der Gestalt. II , 1923 .

[27]  J. Elder,et al.  Ecological statistics of Gestalt laws for the perceptual organization of contours. , 2002, Journal of vision.

[28]  Brian C J Moore,et al.  Properties of auditory stream formation , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[29]  R. W. Hukin,et al.  Perceptual segregation of a harmonic from a vowel by interaural time difference and frequency proximity. , 1997, The Journal of the Acoustical Society of America.

[30]  Hideki Kawahara,et al.  Inharmonic speech reveals the role of harmonicity in the cocktail party problem , 2018, Nature Communications.

[31]  Josh H. McDermott,et al.  Attentive Tracking of Sound Sources , 2015, Current Biology.

[32]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[33]  W. Geisler,et al.  Contributions of ideal observer theory to vision research , 2011, Vision Research.

[34]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[35]  B. Moore,et al.  Thresholds for hearing mistuned partials as separate tones in harmonic complexes. , 1986, The Journal of the Acoustical Society of America.

[36]  S McAdams,et al.  Hearing a mistuned harmonic in an otherwise periodic complex tone. , 1990, The Journal of the Acoustical Society of America.

[37]  I. Nelken,et al.  Modeling the auditory scene: predictive regularity representations and perceptual objects , 2009, Trends in Cognitive Sciences.

[38]  Josh H. McDermott,et al.  Adaptive and Selective Time Averaging of Auditory Scenes , 2018, Current Biology.

[39]  Mark W. Greenlee,et al.  Comparison of fMRI responses during discrimination under certainty and uncertainty conditions , 2002 .

[40]  Barbara Shinn-Cunningham,et al.  Spatial cues alone produce inaccurate sound segregation: the effect of interaural time differences. , 2012, The Journal of the Acoustical Society of America.

[41]  C. M. Marin,et al.  Concurrent vowel identification II: Effects of phase, harmonicity and task , 1997 .

[42]  C. Darwin,et al.  The Quarterly Journal of Experimental Psychology Section a Human Experimental Psychology Perceptual Grouping of Speech Components Differing in Fundamental Frequency and Onset-time Perceptual Grouping of Speech Components Differing in Fundamental Frequency and Onset-time , 2022 .

[43]  Andrew J. King,et al.  Network Receptive Field Modeling Reveals Extensive Integration and Multi-feature Selectivity in Auditory Cortical Neurons , 2016, PLoS Comput. Biol..

[44]  Timothy Q Gentner,et al.  Central auditory neurons have composite receptive fields , 2016, Proceedings of the National Academy of Sciences.

[45]  N. C. Singh,et al.  Modulation spectra of natural sounds and ethological theories of auditory processing. , 2003, The Journal of the Acoustical Society of America.

[46]  Daniel Pressnitzer,et al.  Rapid Formation of Robust Auditory Memories: Insights from Noise , 2010, Neuron.

[47]  Josh H. McDermott The cocktail party problem , 2009, Current Biology.

[48]  Max Wertheimer,et al.  Untersuchungen zur Lehre von der Gestalt , .

[49]  A. Yuille,et al.  Object perception as Bayesian inference. , 2004, Annual review of psychology.

[50]  D. Pressnitzer,et al.  Perceptual Organization of Sound Begins in the Auditory Periphery , 2008, Current Biology.

[51]  Zhuo Chen,et al.  Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52]  Nima Mesgarani,et al.  Deep attractor network for single-microphone speaker separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[53]  J. Arezzo,et al.  Auditory stream segregation in monkey auditory cortex: effects of frequency separation, presentation rate, and tone duration. , 2004, The Journal of the Acoustical Society of America.

[54]  M. Ruggero Responses to sound of the basilar membrane of the mammalian cochlea , 1992, Current Opinion in Neurobiology.

[55]  Jeffrey S. Perry,et al.  Contour statistics in natural images: Grouping across occlusions , 2009, Visual Neuroscience.

[56]  H S Colburn,et al.  The precedence effect. , 1999, The Journal of the Acoustical Society of America.

[57]  C. Darwin,et al.  Grouping in pitch perception: effects of onset asynchrony and ear of presentation of a mistuned component. , 1992, The Journal of the Acoustical Society of America.

[58]  C. Micheyl,et al.  Auditory stream segregation on the basis of amplitude-modulation rate. , 2002, The Journal of the Acoustical Society of America.

[59]  Daniel P. W. Ellis,et al.  Combining localization cues and source model constraints for binaural source separation , 2011, Speech Commun..

[60]  Eero P. Simoncelli,et al.  Article Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis , 2022 .

[61]  Jessika Weiss,et al.  Vision Science Photons To Phenomenology , 2016 .

[62]  Guy J. Brown,et al.  Separation of speech from interfering sounds based on oscillatory correlation , 1999, IEEE Trans. Neural Networks.

[63]  C. Darwin Auditory grouping , 1997, Trends in Cognitive Sciences.

[64]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[65]  Josh H McDermott,et al.  Recovering sound sources from embedded repetition , 2011, Proceedings of the National Academy of Sciences.

[66]  Terrence J. Sejnowski,et al.  Coding Time-Varying Signals Using Sparse, Shift-Invariant Representations , 1998, NIPS.

[67]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[68]  Wiktor Mlynarski,et al.  The Opponent Channel Population Code of Sound Location Is an Efficient Representation of Natural Binaural Sounds , 2015, PLoS Comput. Biol..

[69]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[70]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[71]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[72]  Jeffrey S. Perry,et al.  Edge co-occurrence in natural images predicts contour grouping performance , 2001, Vision Research.

[73]  Wiktor Mlynarski,et al.  Learning Midlevel Auditory Codes from Natural Sound Statistics , 2017, Neural Computation.

[74]  H S Colburn,et al.  Reducing informational masking by sound segregation. , 1994, The Journal of the Acoustical Society of America.

[75]  Leon van Noorden,et al.  Minimum differences of level and frequency for perceptual fission of tone sequences ABAB , 1977 .