MIREX 2011 SUBMISSION - COMBINING VISUAL AND ACOUSTIC FEATURES FOR MUSIC GENRE CLASSIFICATION

ABSTRACT The system uses two types of effective features for genre classification. The visual features that can capture the characteristics of a spectrogram’s texture patterns. On the other hand, acoustic features are extracted using universal background model and maximum a posteriori adaptation. Based on these two types of features, we then employ SVM to perform the final classification task 1. INTRODUCTION Since the effectiveness of GSV (Gaussian Super Vector) has been proven in MIREX 2009 [1,2], here we incorpo-rates visual features and GSV for genre classification. This system was described in [3]. For detail explanation, please see the original paper. 2. ACOUSTIC FEATURES Here we follow the method in [2]. First of all, a universal background model (UBM) is trained from a huge music dataset by using a Gaussian mixture model (GMM) to represent the common distribution of short term features (e.g. MFCCs). The music collection consists of nearly 2000 music clips over different genres. The number of Gaussian mixture component is set to be 30.Next, for a particular music clip, we take the UBM as a prior distri-bution and use maximum a posterior (MAP) adaptation to establish the corresponding GMM. Thus each music clip can be represented by a set of GMM parameters called GSV.

[1]  Jyh-Shing Roger Jang,et al.  Combining Visual and Acoustic Features for Music Genre Classification , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.