Music classification via the bag-of-features approach

A central problem in music information retrieval is audio-based music classification. Current music classification systems follow a frame-based analysis model. A whole song is split into frames, where a feature vector is extracted from each local frame. Each song can then be represented by a set of feature vectors. How to utilize the feature set for global song-level classification is an important problem in music classification. Previous studies have used summary features and probability models which are either overly restrictive in modeling power or numerically too difficult to solve. In this paper, we investigate the bag-of-features approach for music classification which can effectively aggregate the local features for song-level feature representation. Moreover, we have extended the standard bag-of-features approach by proposing a multiple codebook model to exploit the randomness in the generation of codebooks. Experimental results for genre classification and artist identification on benchmark data sets show that the proposed classification system is highly competitive against the standard methods.

[1]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[2]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[3]  Kris West Novel techniques for audio music classification and search , 2008, ACMMR.

[4]  Changsheng Xu,et al.  Automatic music classification and summarization , 2005, IEEE Transactions on Speech and Audio Processing.

[5]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[6]  Beth Logan,et al.  A music similarity function based on signal analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[7]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[8]  Thomas Sikora,et al.  Audio classification based on MPEG-7 spectral basis representations , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[12]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Daniel P. W. Ellis,et al.  A Large-Scale Evaluation of Acoustic and Subjective Music-Similarity Measures , 2004, Computer Music Journal.

[14]  George Tzanetakis,et al.  Pitch Histograms in Audio and Symbolic Music Information Retrieval , 2003, ISMIR.

[15]  Trieu-Kien Truong,et al.  Audio classification and categorization based on wavelets and support vector Machine , 2005, IEEE Transactions on Speech and Audio Processing.

[16]  Klaus Seyerlehner,et al.  FRAME LEVEL AUDIO SIMILARITY - A CODEBOOK APPROACH , 2008 .

[17]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[18]  Tao Li,et al.  Toward intelligent music information retrieval , 2006, IEEE Transactions on Multimedia.

[19]  Lars Kai Hansen,et al.  Temporal Feature Integration for Music Genre Classification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Andreas Rauber,et al.  Evaluation of Feature Extractors and Psycho-Acoustic Transformations for Music Genre Classification , 2005, ISMIR.

[21]  Constantine Kotropoulos,et al.  Music Genre Classification: A Multilinear Approach , 2008, ISMIR.

[22]  Fabian Mörchen,et al.  Modeling timbre distance with temporal statistics from polyphonic music , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Perry R. Cook,et al.  Content-Based Musical Similarity Computation using the Hierarchical Dirichlet Process , 2008, ISMIR.

[24]  Perry R. Cook,et al.  Easy As CBA: A Simple Probabilistic Model for Tagging Music , 2009, ISMIR.

[25]  Claus Weihs,et al.  Classification in music research , 2007, Adv. Data Anal. Classif..