MusicMiner : Visualizing timbre distances of music as topographical maps

Timbre distances and similarities are an expression of the phenomenon that some music appears similar while other songs sound very different to us. The notion of genre is often used to categorize music, but songs from a single genre do not necessarily sound similar and vice versa. Instead we aim at a visualization of timbre similarities of sound within a music collection. We analyzed and compared a large amount of different audio features and psychoacoustic variants thereof for the purpose of modelling timbre distance of sound. The sound of polyphonic music is commonly described by extracting audio features on short time windows during which the sound is assumed to be stationary. The resulting down sampled time series are aggregated to form a high level feature vector describing the music. We generated high level features by systematically applying static and temporal statistics for aggregation. Especially the temporal structure of features has previously been largely neglected. A novel supervised feature selection method is applied to the huge set of possible features. Distances between vectors of the selected features correspond to timbre differences in music. The features show few redundancies and have high potential for explaining possible clusters. They outperform seven other previously proposed feature sets on several datasets w.r.t. the separation of the known groups of timbrally different music. Clustering and visualization based on these feature vectors can discover emergent structures in collections of music. Visualization based on Emergent Self-Organizing Maps in particular enables the unsupervised discovery of timbrally consistent clusters that may or may not correspond to musical genres and artists. We demonstrate the visualizations capabilities of the U-Map and related methods based on the new audio features. An intuitive browsing of large music collections is offered based on the paradigm of topographic maps. The user can navigate the sound space and interact with the maps to play music or show the context of a song. Data Bionics Research Group, Philipps-University Marburg, 35032 Marburg, Germany

[1]  Katharina Morik,et al.  Automatic Feature Extraction for Classifying Audio Data , 2005, Machine Learning.

[2]  Alfred Ultsch,et al.  Pareto Density Estimation: Probability Density Estimation for Knowledge Discovery , 2003 .

[3]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Daniel P. W. Ellis,et al.  The Quest for Ground Truth in Musical Artist Similarity , 2002, ISMIR.

[5]  Richard J. Povinelli,et al.  Joint frequency domain and reconstructed phase space features for speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[7]  George Tzanetakis,et al.  Automatic Musical Genre Classification of Audio Signals , 2001, ISMIR.

[8]  Alfred Ultsch,et al.  U *-Matrix : a Tool to visualize Clusters in high dimensional Data , 2004 .

[9]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[10]  S. Howard Bartley,et al.  The relation of pitch to frequency. , 1950 .

[11]  Fabio Vignoli,et al.  Mapping Music In The Palm Of Your Hand, Explore And Discover Your Collection , 2004, ISMIR.

[12]  Steve Lawrence,et al.  Artist detection in music with Minnowmatch , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[13]  Ishwar K. Sethi,et al.  Classification of general audio data for content-based retrieval , 2001, Pattern Recognit. Lett..

[14]  Daniel P. W. Ellis,et al.  USING VOICE SEGMENTS TO IMPROVE ARTIST CLASSIFICATION OF MUSIC , 2002 .

[15]  Alfred Ultsch,et al.  Self Organizing Neural Networks perform different from statistical k-means clustering , 2003 .

[16]  François Pachet,et al.  Tools and Architecture for the Evaluation of Similarity Measures : Case Study of Timbre Similarity , 2004, ISMIR.

[17]  Qi Tian,et al.  Musical genre classification using support vector machines , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[18]  François Pachet,et al.  Evolving Automatically High-Level Music Descriptors from Acoustic Signals , 2003, CMMR.

[19]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[20]  A. Ultsch Maps for the Visualization of high-dimensional Data Spaces , 2003 .

[21]  Ernst Terhardt,et al.  Calculating virtual pitch , 1979, Hearing Research.

[22]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[23]  François Pachet,et al.  Improving Timbre Similarity : How high’s the sky ? , 2004 .

[24]  Mohan S. Kankanhalli,et al.  Automatically summarize musical audio using adaptive clustering , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[25]  Gerhard Widmer,et al.  Towards Characterisation of Music via Rhythmic Patterns , 2004, ISMIR.

[26]  Elias Pampalk,et al.  Content-based organization and visualization of music archives , 2002, MULTIMEDIA '02.

[27]  Josep Lluís Arcos,et al.  Visualizing and Exploring Personal Music Libraries , 2004, ISMIR.

[28]  Jonathan Foote,et al.  Automatic Music Summarization via Similarity Analysis , 2002, ISMIR.

[29]  Alfred Ultsch The Integration of Neural Networks with Symbolic Knowledge Processing , 1994 .

[30]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[31]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[32]  Daniel P. W. Ellis,et al.  A Large-Scale Evaluation of Acoustic and Subjective Music-Similarity Measures , 2004, Computer Music Journal.

[33]  George Tzanetakis,et al.  MARSYAS: a framework for audio analysis , 1999, Organised Sound.

[34]  J. Stephen Downie,et al.  Visual Collaging Of Music In A Digital Library , 2004, ISMIR.

[35]  Gerhard Widmer,et al.  Exploring Music Collections by Browsing Different Views , 2004, Computer Music Journal.

[36]  Elias Pampalk A Matlab Toolbox to Compute Music Similarity from Audio , 2004, ISMIR.

[37]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[38]  F. Mörchen,et al.  ESOM-Maps : tools for clustering , visualization , and classification with Emergent SOM , 2005 .

[39]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.

[40]  Jerry D. Gibson,et al.  Digital coding of waveforms: Principles and applications to speech and video , 1985, Proceedings of the IEEE.

[41]  François Pachet,et al.  Automatic extraction of music descriptors from acoustic signals , 2004, ISMIR.

[42]  Stephen Cox,et al.  Features and classifiers for the automatic classification of musical audio signals , 2004, ISMIR.

[43]  Jonathan Foote,et al.  An overview of audio information retrieval , 1999, Multimedia Systems.

[44]  Mohan S. Kankanhalli,et al.  Unsupervised classification of music genre using hidden Markov model , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[45]  George Tzanetakis,et al.  Beyond the Query-By-Example Paradigm: New Query Interfaces for Music Information Retrieval , 2002, ICMC.

[46]  George Tzanetakis,et al.  Pitch Histograms in Audio and Symbolic Music Information Retrieval , 2003, ISMIR.

[47]  Pedro Cano,et al.  On the use of FastMap for Audio Retrieval and Browsing , 2002, ISMIR.

[48]  François Pachet,et al.  FINDING SONGS THAT SOUND THE SAME , 2002 .

[49]  Jessica Lin,et al.  Finding Motifs in Time Series , 2002, KDD 2002.

[50]  George Tzanetakis,et al.  HUMAN PERCEPTION AND COMPUTER EXTRACTION OF BEAT STRENGTH , 2002 .

[51]  Daniel P. W. Ellis,et al.  Toward Evaluation Techniques for Music Similarity , 2003, SIGIR 2003.

[52]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[53]  Beth Logan,et al.  A music similarity function based on signal analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[54]  Masataka Goto,et al.  A chorus-section detecting method for musical audio signals , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[55]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[56]  François Pachet,et al.  Representing Musical Genre: A State of the Art , 2003 .

[57]  Daniel P. W. Ellis,et al.  Anchor space for classification and similarity measurement of music , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[58]  Jeroen Breebaart,et al.  Features for audio and music classification , 2003, ISMIR.

[59]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[60]  Dennis Shasha,et al.  Query by Humming: a Time Series Database Approach , 2003, SIGMOD 2003.

[61]  G. Widmer,et al.  ON THE EVALUATION OF PERCEPTUAL SIMILARITY MEASURES FOR MUSIC , 2003 .