Deploying Deep Belief Nets for content based audio music similarity

In this paper a method for computing an audio based similarity between music excerpts is presented. The method consists of three main parts, with the first step being feature extraction, which involves the calculation of three feature sets that correspond to music timbre, rhythm and harmony. Next, for each feature set a Deep Belief Network was trained without supervision on a large music collection. The respective distances of the output units of the Deep Belief Networks between two music excerpts are computed, normalized and finally combined to form the distance measure. The proposed method was evaluated on the MIREX 2013 Audio Music Similarity task. Results are encouraging, however, they indicate that the harmonic similarity component degrades the performance.

[1]  François Pachet,et al.  A scale-free distribution of false positives for a large class of audio similarity measures , 2008, Pattern Recognit..

[2]  Antoni B. Chan,et al.  Audio Information Retrieval using Semantic Similarity , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Jan Schl LEARNING BINARY CODES FOR EFFICIENT LARGE-SCALE MUSIC SIMILARITY SEARCH , 2013 .

[4]  Βασίλης Κατσούρος,et al.  Tempo Induction Using Filterbank Analysis and Tonal Features , 2010 .

[5]  François Pachet,et al.  Improving Timbre Similarity : How high’s the sky ? , 2004 .

[6]  Juan Pablo Bello,et al.  Learning a robust Tonnetz-space transform for automatic chord recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Etienne E. Kerre,et al.  Fuzzy Audio Similarity Measures Based on Spectrum Histograms and Fluctuation Patterns , 2007, 2007 International Conference on Multimedia and Ubiquitous Engineering (MUE'07).

[8]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[9]  Daniel P. W. Ellis,et al.  Support vector machine active learning for music retrieval , 2006, Multimedia Systems.

[10]  Beth Logan,et al.  A music similarity function based on signal analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[11]  Christian Dittmar,et al.  Applying Statistical Models and Parametric Distance Measures for Music Similarity Search , 2008, GfKl.

[12]  Vassilis Katsouros,et al.  Music tempo estimation and beat tracking by applying source separation and metrical relations , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Arthur Flexer,et al.  A MIREX Meta-analysis of Hubness in Audio Music Similarity , 2012, ISMIR.

[14]  Tim Pohle,et al.  Combining Features Reduces Hubness in Audio Similarity , 2010, ISMIR.

[15]  G. Peeters,et al.  GMM SUPERVECTOR FOR CONTENT BASED MUSIC SIMILARITY , 2011 .

[16]  Klaus Seyerlehner FUSING BLOCK-LEVEL FEATURES FOR MUSIC SIMILARITY ESTIMATION , 2010 .

[17]  Peter Knees,et al.  On Rhythm and General Music Similarity , 2009, ISMIR.

[18]  Βασίλης Κατσούρος,et al.  Deploying Nonlinear Image Filters to Spectrogram for Harmonic/Percussive Separation , 2012 .

[19]  Peter Knees,et al.  USING BLOCK-LEVEL FEATURES FOR GENRE CLASSIFICATION , TAG CLASSIFICATION AND MUSIC SIMILARITY ESTIMATION , 2010 .

[20]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[21]  Christian Osendorfer,et al.  Music Similarity Estimation with the Mean-Covariance Restricted Boltzmann Machine , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[22]  Arthur Flexer,et al.  FM4 SOUNDPARK AUDIO-BASED MUSIC RECOMMENDATION IN EVERYDAY USE , 2009 .

[23]  Douglas Eck,et al.  Learning Features from Music Audio with Deep Belief Networks , 2010, ISMIR.

[24]  François Pachet,et al.  Music Similarity Measures: What's the use? , 2002, ISMIR.

[25]  Markus Schedl,et al.  Local and global scaling reduce hubs in space , 2012, J. Mach. Learn. Res..

[26]  Matthieu Cord,et al.  Biasing Restricted Boltzmann Machines to Manipulate Latent Selectivity and Sparsity , 2010, NIPS 2010.

[27]  Xavier Serra,et al.  Unifying Low-Level and High-Level Music Similarity Measures , 2011, IEEE Transactions on Multimedia.

[28]  Jan Schlüter Learning Binary Codes For Efficient Large-Scale Music Similarity Search , 2013, ISMIR.

[29]  Juhan Nam,et al.  A Classification-Based Polyphonic Piano Transcription Approach Using Learned Feature Representations , 2011, ISMIR.

[30]  Markus Schedl,et al.  Using Mutual Proximity to Improve Content-Based Audio Similarity , 2011, ISMIR.