Modeling timbre distance with temporal statistics from polyphonic music

Timbre distance and similarity are expressions of the phenomenon that some music appears similar while other songs sound very different to us. The notion of genre is often used to categorize music, but songs from a single genre do not necessarily sound similar and vice versa. In this work, we analyze and compare a large amount of different audio features and psychoacoustic variants thereof for the purpose of modeling timbre distance. The sound of polyphonic music is commonly described by extracting audio features on short time windows during which the sound is assumed to be stationary. The resulting down sampled time series are aggregated to form a high-level feature vector describing the music. We generated high-level features by systematically applying static and temporal statistics for aggregation. The temporal structure of features in particular has previously been largely neglected. A novel supervised feature selection method is applied to the huge set of possible features. The distances of the selected feature correspond to timbre differences in music. The features show few redundancies and have high potential for explaining possible clusters. They outperform seven other previously proposed feature sets on several datasets with respect to the separation of the known groups of timbrally different music.

[1]  Mohan S. Kankanhalli,et al.  Automatic music summarization in compressed domain , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  Ishwar K. Sethi,et al.  Classification of general audio data for content-based retrieval , 2001, Pattern Recognit. Lett..

[5]  A. Ultsch Maps for the Visualization of high-dimensional Data Spaces , 2003 .

[6]  Jerry D. Gibson,et al.  Digital coding of waveforms: Principles and applications to speech and video , 1985, Proceedings of the IEEE.

[7]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[8]  G. Widmer,et al.  ON THE EVALUATION OF PERCEPTUAL SIMILARITY MEASURES FOR MUSIC , 2003 .

[9]  Gerhard Widmer,et al.  Towards Characterisation of Music via Rhythmic Patterns , 2004, ISMIR.

[10]  George Tzanetakis,et al.  Automatic Musical Genre Classification of Audio Signals , 2001, ISMIR.

[11]  Gerhard Widmer,et al.  Exploring Music Collections by Browsing Different Views , 2004, Computer Music Journal.

[12]  Masataka Goto,et al.  A chorus-section detecting method for musical audio signals , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  F. Mörchen,et al.  MusicMiner : Visualizing timbre distances of music as topographical maps , 2005 .

[14]  Mario Nöcker,et al.  Databionic Visualization of Music Collections According to Perceptual Distance , 2005, ISMIR.

[15]  François Pachet,et al.  Representing Musical Genre: A State of the Art , 2003 .

[16]  S. S. Stevens,et al.  Critical Band Width in Loudness Summation , 1957 .

[17]  François Pachet,et al.  Tools and Architecture for the Evaluation of Similarity Measures : Case Study of Timbre Similarity , 2004, ISMIR.

[18]  Katharina Morik,et al.  Automatic Feature Extraction for Classifying Audio Data , 2005, Machine Learning.

[19]  F. Takens Detecting strange attractors in turbulence , 1981 .

[20]  François Pachet,et al.  Improving Timbre Similarity : How high’s the sky ? , 2004 .

[21]  Richard J. Povinelli,et al.  Joint frequency domain and reconstructed phase space features for speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  François Pachet,et al.  Evolving Automatically High-Level Music Descriptors from Acoustic Signals , 2003, CMMR.

[23]  Jonathan Foote,et al.  An overview of audio information retrieval , 1999, Multimedia Systems.

[24]  Jonathan Foote,et al.  Automatic Music Summarization via Similarity Analysis , 2002, ISMIR.

[25]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[26]  Elias Pampalk,et al.  Using Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps , 2002, ICANN.

[27]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[28]  Daniel P. W. Ellis,et al.  Anchor space for classification and similarity measurement of music , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[29]  FooteJonathan An overview of audio information retrieval , 1999 .

[30]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[31]  Qi Tian,et al.  Musical genre classification using support vector machines , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[32]  Jeroen Breebaart,et al.  Features for audio and music classification , 2003, ISMIR.

[33]  Fabio Vignoli,et al.  Mapping Music In The Palm Of Your Hand, Explore And Discover Your Collection , 2004, ISMIR.

[34]  Alfred Ultsch,et al.  Pareto Density Estimation: Probability Density Estimation for Knowledge Discovery , 2003 .

[35]  Steve Lawrence,et al.  Artist detection in music with Minnowmatch , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[36]  George Tzanetakis,et al.  MARSYAS: a framework for audio analysis , 1999, Organised Sound.

[37]  Stephen Cox,et al.  Features and classifiers for the automatic classification of musical audio signals , 2004, ISMIR.

[38]  J. Stephen Downie,et al.  Visual Collaging Of Music In A Digital Library , 2004, ISMIR.

[39]  Alfred Ultsch,et al.  Self Organizing Neural Networks perform different from statistical k-means clustering , 2003 .

[40]  Daniel P. W. Ellis,et al.  The Quest for Ground Truth in Musical Artist Similarity , 2002, ISMIR.

[41]  George Tzanetakis,et al.  HUMAN PERCEPTION AND COMPUTER EXTRACTION OF BEAT STRENGTH , 2002 .

[42]  George Tzanetakis,et al.  Pitch Histograms in Audio and Symbolic Music Information Retrieval , 2003, ISMIR.

[43]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[44]  Josep Lluís Arcos,et al.  Visualizing and Exploring Personal Music Libraries , 2004, ISMIR.

[45]  Beth Logan,et al.  A music similarity function based on signal analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[46]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.

[47]  George Tzanetakis,et al.  Beyond the Query-By-Example Paradigm: New Query Interfaces for Music Information Retrieval , 2002, ICMC.

[48]  Elias Pampalk A Matlab Toolbox to Compute Music Similarity from Audio , 2004, ISMIR.

[49]  François Pachet,et al.  FINDING SONGS THAT SOUND THE SAME , 2002 .

[50]  Elias Pampalk,et al.  Content-based organization and visualization of music archives , 2002, MULTIMEDIA '02.

[51]  Daniel P. W. Ellis,et al.  USING VOICE SEGMENTS TO IMPROVE ARTIST CLASSIFICATION OF MUSIC , 2002 .