Learning metrics on spectrotemporal modulations reveals the perception of musical instrument timbre

Humans excel at using sounds to make judgements about their immediate environment. In particular, timbre is an auditory attribute that conveys crucial information about the identity of a sound source, especially for music. While timbre has been primarily considered to occupy a multidimensional space, unravelling the acoustic correlates of timbre remains a challenge. Here we re-analyse 17 datasets from published studies between 1977 and 2016 and observe that original results are only partially replicable. We use a data-driven computational account to reveal the acoustic correlates of timbre. Human dissimilarity ratings are simulated with metrics learned on acoustic spectrotemporal modulation models inspired by cortical processing. We observe that timbre has both generic and experiment-specific acoustic correlates. These findings provide a broad overview of former studies on musical timbre and identify its relevant acoustic substrates according to biologically inspired models.

[1]  Eero P. Simoncelli,et al.  Article Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis , 2022 .

[2]  S. Shamma On the role of space and time in auditory processing , 2001, Trends in Cognitive Sciences.

[3]  R. Zatorre,et al.  Behavioral and neural correlates of perceived and imagined musical timbre , 2004, Neuropsychologia.

[4]  Mounya Elhilali,et al.  Connecting Deep Neural Networks to Physical, Perceptual, and Electrophysiological Auditory Signals , 2018, Front. Neurosci..

[5]  Powen Ru,et al.  Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.

[6]  Emmanuel Bigand,et al.  Seven problems that keep MIR from attracting the interest of cognition and neuroscience , 2013, Journal of Intelligent Information Systems.

[7]  Laurent Daudet,et al.  Auditory Sketches: Sparse Representations of Sounds Based on Perceptual Models , 2012, CMMR.

[8]  Emily J. Allen,et al.  Representations of Pitch and Timbre Variation in Human Auditory Cortex , 2017, The Journal of Neuroscience.

[9]  Stephen McAdams,et al.  The Perceptual Representation of Timbre , 2019, Timbre: Acoustics, Perception, and Cognition.

[10]  Liberty S. Hamilton,et al.  Acoustic structure of the five perceptual dimensions of timbre in orchestral instrument tones. , 2013, The Journal of the Acoustical Society of America.

[11]  J. Fritz,et al.  Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex , 2003, Nature Neuroscience.

[12]  J. Berger,et al.  The thirteen colors of timbre , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[13]  David Wessel,et al.  Timbre Space as a Musical Control Structure , 1979 .

[14]  J. Grey Multidimensional perceptual scaling of musical timbres. , 1977, The Journal of the Acoustical Society of America.

[15]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[16]  Stefanie E. Kuchinsky,et al.  Separable neural representations of sound sources: Speaker identity and musical timbre , 2019, NeuroImage.

[17]  Robert J. Zatorre,et al.  Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody , 2020, Science.

[18]  Clara Suied,et al.  Auditory Sketches: Very Sparse Representations of Sounds Are Still Recognizable , 2016, PloS one.

[19]  Mattson Ogg,et al.  Acoustic Correlates of Auditory Object and Event Perception: Speakers, Musical Timbres, and Environmental Sounds , 2019, Front. Psychol..

[20]  Philippe Depalle,et al.  Perceptually salient spectrotemporal modulations for recognition of sustained musical instruments. , 2016, The Journal of the Acoustical Society of America.

[21]  S. Lakatos A common perceptual space for harmonic and percussive timbres , 2000, Perception & psychophysics.

[22]  Daniel L. K. Yamins,et al.  A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy , 2018, Neuron.

[23]  Alain de Cheveigné,et al.  An ear for statistics , 2013, Nature Neuroscience.

[24]  Mounya Elhilali,et al.  Music in Our Ears: The Biological Bases of Musical Timbre Perception , 2012, PLoS Comput. Biol..

[25]  P. Belin,et al.  Cracking the social code of speech prosody using reverse correlation , 2018, Proceedings of the National Academy of Sciences.

[26]  C. Krumhansl,et al.  Isolating the dynamic attributes of musical timbre. , 1993, The Journal of the Acoustical Society of America.

[27]  Aniruddh D. Patel,et al.  Songbirds use spectral shape, not pitch, for sound pattern recognition , 2016, Proceedings of the National Academy of Sciences.

[28]  R. Kronland-Martinet,et al.  From Clarinet Control to Timbre Perception , 2010 .

[29]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[30]  S. Shamma,et al.  Spectro-temporal modulation transfer functions and speech intelligibility. , 1999, The Journal of the Acoustical Society of America.

[31]  G. Soete,et al.  Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes , 1995, Psychological research.

[32]  Patrick Susini,et al.  The Timbre Toolbox: extracting audio descriptors from musical signals. , 2011, The Journal of the Acoustical Society of America.

[33]  Stephen McAdams,et al.  Four Distinctions for the Auditory “Wastebasket” of Timbre1 , 2017, Front. Psychol..

[34]  K. Sen,et al.  Spectral-temporal Receptive Fields of Nonlinear Auditory Neurons Obtained Using Natural Sounds , 2022 .

[35]  J. W. Gordon,et al.  Perceptual effects of spectral modifications on musical timbres , 1978 .

[36]  Petri Toiviainen,et al.  A Matlab Toolbox for Music Information Retrieval , 2007, GfKl.

[37]  Shihab A Shamma,et al.  Task reward structure shapes rapid receptive field plasticity in auditory cortex , 2012, Proceedings of the National Academy of Sciences.

[38]  Elia Formisano,et al.  Encoding of natural timbre dimensions in human auditory cortex , 2018, NeuroImage.

[39]  Stephen McAdams,et al.  Acoustic and Categorical Dissimilarity of Musical Timbre: Evidence from Asymmetries Between Acoustic and Chimeric Sounds , 2016, Front. Psychol..

[40]  Mounya Elhilali,et al.  Modelling auditory attention , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[41]  David Poeppel,et al.  Spectrotemporal modulation provides a unifying framework for auditory cortical asymmetries , 2019, Nature Human Behaviour.

[42]  Erika Skoe,et al.  Experience‐induced Malleability in Neural Encoding of Pitch, Timbre, and Timing , 2009, Annals of the New York Academy of Sciences.

[43]  N. Mesgarani,et al.  Selective cortical representation of attended speaker in multi-talker speech perception , 2012, Nature.

[44]  Olivier Macherey,et al.  Perception of Musical Timbre by Cochlear Implant Listeners: A Multidimensional Scaling Study , 2013, Ear and hearing.

[45]  S. Mallat,et al.  Joint Time–Frequency Scattering , 2018, IEEE Transactions on Signal Processing.

[46]  Stephen McAdams,et al.  Perceptually Salient Regions of the Modulation Power Spectrum for Musical Instrument Identification , 2017, Front. Psychol..