A comparative study on content-based music genre classification

Content-based music genre classification is a fundamental component of music information retrieval systems and has been gaining importance and enjoying a growing amount of attention with the emergence of digital music on the Internet. Currently little work has been done on automatic music genre classification, and in addition, the reported classification accuracies are relatively low. This paper proposes a new feature extraction method for music genre classification, DWCHs. DWCHs stands for Daubechies Wavelet Coefficient Histograms. DWCHs capture the local and global information of music signals simultaneously by computing histograms on their Daubechies wavelet coefficients. Effectiveness of this new feature and of previously studied features are compared using various machine learning classification algorithms, including Support Vector Machines and Linear Discriminant Analysis. It is demonstrated that the use of DWCHs significantly improves the accuracy of music genre classification.

[1]  Ashfaq A. Khokhar,et al.  Content-based indexing and retrieval of audio data using wavelets , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[2]  Shingo Uchihashi,et al.  The beat spectrum: a new approach to rhythm analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[3]  C. Krumhansl Music as Cognition. , 1987 .

[4]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[5]  Shenghuo Zhu,et al.  A survey on wavelet applications in data mining , 2002, SKDD.

[6]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[7]  David Pye,et al.  Content-based methods for the management of digital music , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Eric D. Scheirer,et al.  Tempo and beat analysis of acoustic musical signals. , 1998, The Journal of the Acoustical Society of America.

[9]  Sethuraman Panchanathan,et al.  Fast Wavelet Histogram Techniques for Image Indexing , 1999, Comput. Vis. Image Underst..

[10]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[11]  Sethuraman Panchanathan,et al.  Wavelet-histogram method for face recognition , 2000, J. Electronic Imaging.

[12]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[13]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[14]  Hagen Soltau,et al.  Recognition of music types , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[16]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[17]  Jonathan Foote,et al.  Content-based retrieval of music and audio , 1997, Other Conferences.

[18]  Yoichi Muraoka,et al.  A beat tracking system for acoustic signals of music , 1994, MULTIMEDIA '94.

[19]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[20]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[23]  Jean Laroche,et al.  Estimating tempo, swing and beat locations in audio recordings , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[24]  Patrick Flandrin,et al.  Wavelet analysis and synthesis of fractional Brownian motion , 1992, IEEE Trans. Inf. Theory.

[25]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[26]  Jeffrey Scott Vitter,et al.  Wavelet-based histograms for selectivity estimation , 1998, SIGMOD '98.

[27]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[28]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[29]  George Tzanetakis,et al.  MARSYAS: a framework for audio analysis , 1999, Organised Sound.

[30]  Glenn Fung,et al.  Multicategory Proximal Support Vector Machine Classifiers , 2005, Machine Learning.

[31]  Mark B. Sandler,et al.  Classification of audio signals using statistical features on time and wavelet transform domains , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[32]  Hrishikesh Deshpande,et al.  CLASSIFICATION OF MUSIC SIGNALS IN THE VISUAL DOMAIN , 2001 .

[33]  I. Daubechies Ten Lectures on Wavelets , 1992 .