Combining Visual and Acoustic Features for Music Genre Classification

Music genre classification is a challenging task in the field of music information retrieval. Existing approaches usually attempt to extract features only from acoustic aspect. However, spectrogram also provides useful information because it describes the temporal change of energy distribution over frequency bins. In this paper, we propose the use of Gabor filters to generate effective visual features that can capture the characteristics of a spectrogram¡¦s texture patterns. On the other hand, acoustic features are extracted using universal background model and maximum a posteriori adaptation. Based on these two types of features, we then employ SVM to perform the final classification task. Experimental results demonstrate that combining visual and acoustic features can achieve satisfactory classification accuracy on two widely used datasets.

[1]  Hrishikesh Deshpande,et al.  CLASSIFICATION OF MUSIC SIGNALS IN THE VISUAL DOMAIN , 2001 .

[2]  Lie Lu,et al.  Music type classification by spectral contrast feature , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[3]  François Pachet,et al.  Music Similarity Measures: What's the use? , 2002, ISMIR.

[4]  Zhouyu Fu,et al.  A Survey of Audio-Based Music Classification and Annotation , 2011, IEEE Transactions on Multimedia.

[5]  Ming Li,et al.  THINKIT'S SUBMISSIONS FOR MIREX2009 AUDIO MUSIC CLASSIFICATION AND SIMILARITY TASKS , 2009 .

[6]  Douglas Eck,et al.  Scalable Genre and Tag Prediction with Spectral Covariance , 2010, ISMIR.

[7]  Constantine Kotropoulos,et al.  Music Genre Classification: A Multilinear Approach , 2008, ISMIR.

[8]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[9]  Haizhou Li,et al.  An SVM Kernel With GMM-Supervector Based on the Bhattacharyya Distance for Speaker Recognition , 2009, IEEE Signal Processing Letters.

[10]  Yoshihiko Hamamoto,et al.  A gabor filter-based method for recognizing handwritten numerals , 1998, Pattern Recognit..

[11]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[13]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[14]  B. Lee Cooper,et al.  Rhythm and Noise: An Aesthetics of Rock , 2000 .

[15]  Peter Knees,et al.  USING BLOCK-LEVEL FEATURES FOR GENRE CLASSIFICATION , TAG CLASSIFICATION AND MUSIC SIMILARITY ESTIMATION , 2010 .

[16]  Kun-Ming Yu,et al.  Automatic Music Genre Classification Based on Modulation Spectral Analysis of Spectral and Cepstral Features , 2009, IEEE Transactions on Multimedia.

[17]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[18]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.

[19]  Douglas Eck,et al.  Aggregate features and ADABOOST for music classification , 2006, Machine Learning.