Ten Experiments on the Modeling of Polyphonic Timbre. (Dix Expériences sur la Modélisation du Timbre Polyphonique)

The majority of systems extracting high-level music descriptions from audio signals rely on a common, implicit model of the global sound or polyphonic timbre of a musical signal. This model represents the timbre of a texture as the long-term distribution of its local spectral features. The underlying assumption is rarely made explicit: the perception of the timbre of a texture is assumed to result from the most statistically significant feature windows. This thesis questions the validity of this assumption. To do so, we construct an explicit measure of the timbre similarity between polyphonic music textures, and variants thereof inspired by previous work in Music Information Retrieval. We show that the precision of such measures is bounded, and that the remaining error rate is not incidental. Notably, this class of algorithms tends to create false positives - which we call hubs - which are mostly always the same songs regardless of the query. Their study shows that the perceptual saliency of feature observations is not necessarily correlated with their statistical significance with respect to the global distribution. In other words, music listeners routinely “hear” things that are not statistically significant in musical signals, but rather are the result of high-level cognitive reasoning, which depends on cultural expectations, a priori knowledge, and context. Much of the music we hear as being “piano music” is really music that we expect to be piano music. Such statistical/ perceptual paradoxes are instrumental in the observed discrepancy between human perception of timbre and the models studied here.

[1]  Fabio Vignoli,et al.  A Music Retrieval System Based on User Driven Similarity and Its Evaluation , 2005, ISMIR.

[2]  Beth Logan,et al.  A music similarity function based on signal analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[3]  Donna K. Harman,et al.  Overview of the Eighth Text REtrieval Conference (TREC-8) , 1999, TREC.

[4]  Y. Mahieux,et al.  Transform coding of audio signals at 64 kbit/s , 1990, [Proceedings] GLOBECOM '90: IEEE Global Telecommunications Conference and Exhibition.

[5]  C. Krumhansl,et al.  Isolating the dynamic attributes of musical timbre. , 1993, The Journal of the Acoustical Society of America.

[6]  Suman Nath,et al.  Project Report for 15781 Classification of Music Genre , 2001 .

[7]  Keith D. Martin,et al.  TOWARD AUTOMATIC SOUND SOURCE RECOGNITION: IDENTIFYING MUSICAL INSTRUMENTS , 1998 .

[8]  Ichiro Fujinaga,et al.  Automatic Genre Classification Using Large High-Level Musical Feature Sets , 2004, ISMIR.

[9]  Lou Boves,et al.  A new procedure for classifying speakers in speaker verification systems , 1997, EUROSPEECH.

[10]  Reinier Plomp,et al.  Aspects of tone sensation : a psychophysical study , 1976 .

[11]  George Tzanetakis,et al.  Automatic Musical Genre Classification of Audio Signals , 2001, ISMIR.

[12]  Roger A. Kendall,et al.  Perceptual Scaling of Simultaneous Wind Instrument Timbres , 1991 .

[13]  J. Stephen Downie,et al.  Survey Of Music Information Needs, Uses, And Seeking Behaviours: Preliminary Findings , 2004, ISMIR.

[14]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[15]  Lie Lu,et al.  Automatic mood detection from acoustic music data , 2003, ISMIR.

[16]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[17]  Ilkay Ulusoy,et al.  Generative versus discriminative methods for object recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Ye Wang,et al.  Automatic Detection Of Vocal Segments In Popular Songs , 2004, ISMIR.

[19]  Daniel P. W. Ellis,et al.  Automatic Record Reviews , 2004, ISMIR.

[20]  Tristan Jehan Perceptual Segment Clustering For Music Description And Time-axis Redundancy Cancellation , 2004, ISMIR.

[21]  Stephen Cox,et al.  Features and classifiers for the automatic classification of musical audio signals , 2004, ISMIR.

[22]  François Pachet,et al.  A taxonomy of musical genres , 2000, RIAO.

[23]  Qin Jin,et al.  A na ve de-lambing method for speaker identification , 2000, INTERSPEECH.

[24]  Youngmoo E. Kim,et al.  Singer Identification in Popular Music Recordings Using Voice Coding Features , 2002 .

[25]  Ian H. Witten,et al.  Sequence-based melodic comparison: a dynamic programming approach , 1998 .

[26]  Paris Smaragdis,et al.  Combining Musical and Cultural Features for Intelligent Style Detection , 2002, ISMIR.

[27]  B. Julesz Textons, the elements of texture perception, and their interactions , 1981, Nature.

[28]  Thomas Kamps,et al.  Improving Content-Based Similarity Measures by Training a Collaborative Model , 2005, ISMIR.

[29]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[30]  Xavier Rodet,et al.  Instrument identification in solo and ensemble music using Independent Subspace Analysis , 2004, ISMIR.

[31]  Changsheng Xu,et al.  An SVM-based classification approach to musical audio , 2003, ISMIR.

[32]  George Tzanetakis,et al.  Multifeature audio segmentation for browsing and annotation , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[33]  Andreas Rauber,et al.  Evaluation of Feature Extractors and Psycho-Acoustic Transformations for Music Genre Classification , 2005, ISMIR.

[34]  J. Platt Strong Inference: Certain systematic methods of scientific thinking may produce much more rapid progress than others. , 1964, Science.

[35]  Jeffrey Scott Vitter,et al.  SASH: A Self-Adaptive Histogram Set for Dynamically Changing Workloads , 2003, VLDB.

[36]  François Pachet,et al.  Evolving Automatically High-Level Music Descriptors from Acoustic Signals , 2003, CMMR.

[37]  Xavier Rodet,et al.  Features extraction and temporal segmentation of acoustic signals , 1998, ICMC.

[38]  Giorgio Zoia,et al.  On the Modeling of Time Information for Automatic Genre Recognition Systems in Audio Signals , 2005, ISMIR.

[39]  George Tzanetakis,et al.  Audio Information Retrieval (AIR) Tools , 2000, ISMIR.

[40]  Tao Li,et al.  Factors in automatic musical genre classification of audio signals , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[41]  François Pachet,et al.  Musical data mining for electronic music distribution , 2001, Proceedings First International Conference on WEB Delivering of Music. WEDELMUSIC 2001.