A Comparison of Approaches to Timbre Descriptors in Music Information Retrieval and Music Psychology

A curious divide characterizes the usage of audio descriptors for timbre research in music information research (MIR) and music psychology. While MIR uses a multitude of audio descriptors for tasks such as automatic instrument classification, only a highly constrained set is used to describe the physical correlates of timbre perception in parts of music psychology. We argue that this gap is not coincidental and results from the differences in the two fields’ methodologies, their epistemic groundwork, and research goals. This paper lays out perspectives on the emergence of the divide and reviews studies in both fields with regards to divergences in research methods and goals. We discuss new representations for spectro-temporal modulations in MIR and psychology, and compare approaches to spectral envelope description in depth. Finally, we will propose that the interdisciplinary discourse on the computational modelling of music requires negotiations about the roles of scientific evaluation criteria.

[1]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. I. , 1962 .

[2]  Eero P. Simoncelli,et al.  Article Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis , 2022 .

[3]  Zhouyu Fu,et al.  A Survey of Audio-Based Music Classification and Annotation , 2011, IEEE Transactions on Multimedia.

[4]  A. de Cheveigné,et al.  The dependency of timbre on fundamental frequency. , 2003, The Journal of the Acoustical Society of America.

[5]  A. Tversky,et al.  Additive similarity trees , 1977 .

[6]  Stéphane Mallat,et al.  Group Invariant Scattering , 2011, ArXiv.

[7]  G. Peeters Automatic Classification of Large Musical Instrument Databases Using Hierarchical Classifiers with Inertia Ratio Maximization , 2003 .

[8]  S. Lakatos A common perceptual space for harmonic and percussive timbres , 2000, Perception & psychophysics.

[9]  Bob L. Sturm,et al.  On Automatic Music Genre Recognition by Sparse Representation Classification using Auditory Temporal Modulations , 2012, CMMR 2012.

[10]  Joe Wolfe,et al.  Does timbral brightness scale with frequency and spectral centroid , 2006 .

[11]  Shrikanth S. Narayanan,et al.  An Overview on Perceptually Motivated Audio Indexing and Classification , 2013, Proceedings of the IEEE.

[12]  Stephen V. David,et al.  Attention and Dynamic, Task-Related Receptive Field Plasticity in Adult Auditory Cortex , 2013 .

[13]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. II. Spectral and temporal integration. , 1997, The Journal of the Acoustical Society of America.

[14]  Bob L. Sturm Two systems for automatic music genre recognition: what are they really recognizing? , 2012, MIRUM '12.

[15]  C. Krumhansl,et al.  Isolating the dynamic attributes of musical timbre. , 1993, The Journal of the Acoustical Society of America.

[16]  Ichiro Fujinaga,et al.  Realtime Recognition of Orchestral Instruments , 2000, International Conference on Mathematics and Computing.

[17]  R. Kronland-Martinet,et al.  From Clarinet Control to Timbre Perception , 2010 .

[18]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[19]  R. Shepard,et al.  Toward a universal law of generalization for psychological science. , 1987, Science.

[20]  A. Tversky Features of Similarity , 1977 .

[21]  Gerhard Widmer,et al.  Improvements of Audio-Based Music Similarity and Genre Classificaton , 2005, ISMIR.

[22]  David Wessel,et al.  Control of Phrasing and Articulation in Synthesis , 1987, ICMC.

[23]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognition Letters.

[24]  Daniel Pressnitzer,et al.  Rapid Formation of Robust Auditory Memories: Insights from Noise , 2010, Neuron.

[25]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[26]  Torsten Dau,et al.  Auditory processing models , 2008 .

[27]  S. McAdams,et al.  Acoustic correlates of timbre space dimensions: a confirmatory study using synthetic tones. , 2005, The Journal of the Acoustical Society of America.

[28]  Vinoo Alluri,et al.  Exploring Perceptual and Acoustical Correlates of Polyphonic Timbre , 2010 .

[29]  Ichiro Fujinaga,et al.  Machine recognition of timbre using steady-state tone of acoustic musical instruments , 1998, ICMC.

[30]  Stephen McAdams,et al.  The psychomechanics of simulated sound sources: material properties of impacted thin plates. , 2010, The Journal of the Acoustical Society of America.

[31]  Andrew Y. Ng,et al.  Preventing "Overfitting" of Cross-Validation Data , 1997, ICML.

[32]  Maarten Speekenbrink,et al.  Models of recognition, repetition priming, and fluency: exploring a new framework. , 2012, Psychological review.

[33]  T. Griffiths Manifesto for a new (computational) cognitive revolution , 2015, Cognition.

[34]  Gaël Richard,et al.  Musical instrument recognition by pairwise classification strategies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  G. Soete,et al.  Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes , 1995, Psychological research.

[36]  J. W. Gordon,et al.  Perceptual effects of spectral modifications on musical timbres , 1978 .

[37]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[38]  R. Kronland-Martinet,et al.  Acoustical Correlates of Timbre and Expressiveness in Clarinet Performance , 2010 .

[39]  Mounya Elhilali,et al.  Music in Our Ears: The Biological Bases of Musical Timbre Perception , 2012, PLoS Comput. Biol..

[40]  Geraint A. Wiggins Semantic Gap?? Schemantic Schmap!! Methodological Considerations in the Scientific Study of Music , 2009, 2009 11th IEEE International Symposium on Multimedia.

[41]  Frédéric E. Theunissen,et al.  The Modulation Transfer Function for Speech Intelligibility , 2009, PLoS Comput. Biol..

[42]  Arthur Flexer,et al.  On Inter-rater Agreement in Audio Music Similarity , 2014, ISMIR.

[43]  K. Popper,et al.  Conjectures and refutations;: The growth of scientific knowledge , 1972 .

[44]  Edward C. Carterette,et al.  Perceptual and Acoustical Features of Natural and Synthetic Orchestral Instrument Tones , 1999 .

[45]  A. Jacobs,et al.  Models of visual word recognition: Sampling the state of the art. , 1994 .

[46]  S. McAdams,et al.  Auditory Cognition. (Book Reviews: Thinking in Sound. The Cognitive Psychology of Human Audition.) , 1993 .

[47]  Jan Stepanek,et al.  Interpretation of Violin Spectrum Using Psychoacoustic Experiments , 2004 .

[48]  Stephen McAdams,et al.  Recognition of sound sources and events , 1993 .

[49]  Stephen McAdams,et al.  Environmental Sound Perception: Metadescription and Modeling Based on Independent Primary Studies , 2010, EURASIP J. Audio Speech Music. Process..

[50]  Yann LeCun,et al.  Moving Beyond Feature Design: Deep Architectures and Automatic Feature Learning in Music Informatics , 2012, ISMIR.

[51]  Patrick Susini,et al.  The Timbre Toolbox: extracting audio descriptors from musical signals. , 2011, The Journal of the Acoustical Society of America.

[52]  Alan Marsden Interrogating Melodic Similarity: A Definitive Phenomenon or the Product of Interpretation? , 2012 .

[53]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[54]  Anssi Klapuri,et al.  Automatic Classification of Pitched Musical Instrument Sounds , 2006 .

[55]  Keith Dana Martin,et al.  Sound-source recognition: a theory and computational model , 1999 .

[56]  Brian C J Moore,et al.  Exploring violin sound quality: investigating English timbre descriptors and correlating resynthesized acoustical modifications with perceptual properties. , 2012, The Journal of the Acoustical Society of America.

[57]  Dan Stowell,et al.  Detection and classification of acoustic scenes and events: An IEEE AASP challenge , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[58]  G. Lemaitre,et al.  Psychological measurement for sound description and evaluation , 2011 .

[59]  K. Popper,et al.  Conjectures and refutations;: The growth of scientific knowledge , 1972 .

[60]  T. Dau,et al.  A computational model of human auditory signal processing and perception. , 2008, The Journal of the Acoustical Society of America.

[61]  Roy D. Patterson,et al.  The Perception of Family and Register in Musical Tones , 2010 .

[62]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. , 1997, The Journal of the Acoustical Society of America.

[63]  Gary P. Scavone,et al.  Effect of task constraints on the perceptual evaluation of violins , 2015 .

[64]  James W. Beauchamp,et al.  Evaluation of Mel-Band and MFCC-Based Error Metrics for Correspondence to Discrimination of Spectrally Altered Musical Instrument Sounds* , 2011 .

[65]  Hiroko Terasawa,et al.  In Search of a Perceptual Metric for Timbre: Dissimilarity Judgments among Synthetic Sounds with MFCC-Derived Spectral Envelopes , 2012 .

[66]  Stéphane Mallat,et al.  Audio Texture Synthesis with Scattering Moments , 2013, ArXiv.

[67]  Emmanuel Bigand,et al.  Seven problems that keep MIR from attracting the interest of cognition and neuroscience , 2013, Journal of Intelligent Information Systems.

[68]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.

[69]  Liberty S. Hamilton,et al.  Acoustic structure of the five perceptual dimensions of timbre in orchestral instrument tones. , 2013, The Journal of the Acoustical Society of America.

[70]  S. Handel,et al.  Chapter 12 – Timbre Perception and Auditory Object Identification , 1995 .

[71]  S. McAdams,et al.  Discrimination of musical instrument sounds resynthesized with simplified spectrotemporal parameters. , 1999, The Journal of the Acoustical Society of America.

[72]  J. Fritz,et al.  Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex , 2003, Nature Neuroscience.

[73]  Clara Suied,et al.  Fast recognition of musical sounds based on timbre. , 2012, The Journal of the Acoustical Society of America.

[74]  George Tzanetakis,et al.  Musical Instrument Classification Using Individual Partials , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[75]  Daniel P. W. Ellis,et al.  Using knowledge to organize sound: The prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures , 1999, Speech Commun..

[76]  François Pachet,et al.  Improving Timbre Similarity : How high’s the sky ? , 2004 .

[77]  Petr Janata,et al.  Keeping timbre in mind: working memory for complex sounds that can't be verbalized. , 2013, Journal of experimental psychology. Human perception and performance.

[78]  Suzanne Winsberg,et al.  A latent class approach to fitting the weighted Euclidean model, clascal , 1993 .

[79]  J. Stepánek,et al.  Spectral features influencing perception of pipe organ sounds , 2005 .

[80]  Stephen McAdams,et al.  Sound Source Mechanics and Musical Timbre Perception: Evidence From Previous Studies , 2010 .

[81]  Antti Eronen,et al.  Comparison of features for musical instrument recognition , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[82]  Gaël Richard,et al.  Temporal Integration for Audio Classification With Application to Musical Instrument Classification , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[83]  Andrew Horner,et al.  Effect of Spectral Centroid Manipulation on Discrimination and Identification of Instrument Timbres , 2014 .

[84]  Emmanuel Bigand,et al.  Mel Cepstrum & Ann Ova: The Difficult Dialog Between MIR and Music Cognition , 2012, ISMIR.

[85]  H. Helmholtz,et al.  On the Sensations of Tone as a Physiological Basis for the Theory of Music , 2005 .

[86]  Mounya Elhilali,et al.  Task-driven attentional mechanisms for auditory scene recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[87]  Jay I. Myung,et al.  Evaluation and comparison of computational models. , 2009, Methods in enzymology.

[88]  Joakim Andén,et al.  Deep Scattering Spectrum , 2013, IEEE Transactions on Signal Processing.

[89]  J. Grey Multidimensional perceptual scaling of musical timbres. , 1977, The Journal of the Acoustical Society of America.

[90]  Daniel P. W. Ellis,et al.  Signal Processing for Music Analysis , 2011, IEEE Journal of Selected Topics in Signal Processing.

[91]  S. McAdams,et al.  The psychomechanics of simulated sound sources: material properties of impacted bars. , 2004, The Journal of the Acoustical Society of America.

[92]  S. McAdams Musical Timbre Perception , 2013 .

[93]  Bob L. Sturm A Simple Method to Determine if a Music Information Retrieval System is a “Horse” , 2014, IEEE Transactions on Multimedia.

[94]  Stephen McAdams,et al.  Caractérisation du timbre des sons complexes.II. Analyses acoustiques et quantification psychophysique , 1994 .