Enhancing timbre model using MFCC and its time derivatives for music similarity estimation

One of the popular methods for content-based music similarity estimation is to model timbre with MFCC as a single multivariate Gaussian with full covariance matrix, then use symmetric Kullback-Leibler divergence. From the field of speech recognition, we propose to use the same approach on the MFCCs' time derivatives to enhance the timbre model. The Gaussian models for the delta and acceleration coefficients are used to create their respective distance matrix. The distance matrices are then combined linearly to form a full distance matrix for music similarity estimation. In our experiments on two datasets, our novel approach performs better than using MFCC alone. Moreover, performing genre classification using k-NN showed that the accuracies obtained are already close to the state-of-the-art.

[1]  Stanislav Barton,et al.  A fast algorithm for music search by similarity in large databases based on modified Symetrized Kullback Leibler Divergence , 2010, 2010 International Workshop on Content Based Multimedia Indexing (CBMI).

[2]  Nicola Orio,et al.  Music Retrieval: A Tutorial and Review , 2006, Found. Trends Inf. Retr..

[3]  Gerhard Widmer,et al.  A Filter-and-Refine Indexing Method for Fast Similarity Search in Millions of Music Tracks , 2009, ISMIR.

[4]  S. Furui,et al.  Speaker-independent isolated word recognition based on emphasized spectral dynamics , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  François Pachet,et al.  Improving Timbre Similarity : How high’s the sky ? , 2004 .

[6]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[7]  Elias Pampalk,et al.  Audio-Based Music Similarity and Retrieval : Combining a Spectral Similarity Model with Information Extracted from Fluctuation Patterns , 2006 .

[8]  George Tzanetakis,et al.  Automatic Musical Genre Classification of Audio Signals , 2001, ISMIR.

[9]  François Pachet,et al.  Music Similarity Measures: What's the use? , 2002, ISMIR.

[10]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[12]  Elias Pampalk,et al.  Computational Models of Music Similarity and their Application in Music Information Retrieval , 2006 .

[13]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[14]  Søren Holdt Jensen,et al.  Evaluation of MFCC estimation techniques for music similarity , 2006, 2006 14th European Signal Processing Conference.

[15]  Michael A. Casey Content-Based Music Information Retrieval , 2008 .

[16]  Peter Knees,et al.  USING BLOCK-LEVEL FEATURES FOR GENRE CLASSIFICATION , TAG CLASSIFICATION AND MUSIC SIMILARITY ESTIMATION , 2010 .

[17]  Beth Logan,et al.  A music similarity function based on signal analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[18]  Daniel P. W. Ellis,et al.  Song-Level Features and Support Vector Machines for Music Classification , 2005, ISMIR.

[19]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[20]  Franz de Leon,et al.  TO MIREX 2011 GENRE CLASSIFICATION AND AUDIO SIMILARITY TASKS , 2011 .

[21]  Peter Knees,et al.  On Rhythm and General Music Similarity , 2009, ISMIR.