Discrete wavelet packet transform and ensembles of lazy and eager learners for music genre classification

This paper presents a process for determining the music genre of an item using a new set of descriptors. A discrete wavelet packet transform is applied to obtain the signal representation at two different resolutions: a frequency resolution and a time resolution tuned to encode music notes and their onset and offset. These features are tested on a number of data sets as descriptors for music genre classification. Lazy learning classifiers (k-nearest neighbor) and eager learners (neural networks and support vector machines) are applied in order to assess the classification power of the proposed features. Different feature selection techniques and ensemble methods are explored to maximize the accuracy of the classifiers and stabilize their behavior. Our evaluation shows that these frequency descriptors perform better than a standard approach based on Mel-Frequency Cepstral Coefficients and on the Short Time Fourier Transform in music genre classification. Moreover, our work confirms that a parameterization of the music rhythm based on the beat-histogram provides some meaningful information in the context of music classification by genre.Finally, our evaluation suggests that multi-class support vector machines with a linear kernel and round-robin binarization are the simplest and more effective process for music genre classification.

[1]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[2]  Jeroen Breebaart,et al.  Features for audio and music classification , 2003, ISMIR.

[3]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[4]  C.-C. Jay Kuo,et al.  Hierarchical classification of audio data for archiving and retrieving , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[6]  Marco Grimaldi Learning to annotate music files using content based retrieval systems and wavelet packet approximations of the input signals , 2005 .

[7]  S. Mallat A wavelet tour of signal processing , 1998 .

[8]  Keith D. Martin,et al.  TOWARD AUTOMATIC SOUND SOURCE RECOGNITION: IDENTIFYING MUSICAL INSTRUMENTS , 1998 .

[9]  Anil C. Kokaram,et al.  A Wavelet Packet representation of audio signals for music genre classification using different ensemble and feature selection techniques , 2003, MIR '03.

[10]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[11]  Jonathan Foote,et al.  ARTHUR: Retrieving Orchestral Music by Long-Term Structure , 2000, ISMIR.

[12]  I. Daubechies Ten Lectures on Wavelets , 1992 .

[13]  Padraig Cunningham,et al.  Diversity versus Quality in Classification Ensembles Based on Feature Selection , 2000, ECML.

[14]  George Tzanetakis,et al.  Pitch Histograms in Audio and Symbolic Music Information Retrieval , 2003, ISMIR.

[15]  Youngmoo E. Kim,et al.  Musical instrument identification: A pattern‐recognition approach , 1998 .

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  George Tzanetakis,et al.  Automatic Musical Genre Classification of Audio Signals , 2001, ISMIR.

[18]  Eric D. Scheirer,et al.  Tempo and beat analysis of acoustic musical signals. , 1998, The Journal of the Acoustical Society of America.

[19]  Anna Pienimäki,et al.  Indexing Music Databases Using Automatic Extraction of Frequent Phrases , 2002, ISMIR.

[20]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[21]  Padraig Cunningham,et al.  Using Diversity in Preparing Ensembles of Classifiers Based on Different Feature Subsets to Minimize Generalization Error , 2001, ECML.

[22]  Johannes Fürnkranz,et al.  Round Robin Rule Learning , 2001, ICML.

[23]  Eric Allamanche,et al.  Content-based Identification of Audio Material Using MPEG-7 Low Level Description , 2001, ISMIR.

[24]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.

[25]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[26]  Gerhard Widmer,et al.  Classification of dance music by periodicity patterns , 2003, ISMIR.

[27]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[28]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .

[29]  Jonathan Foote,et al.  Content-based retrieval of music and audio , 1997, Other Conferences.

[30]  J. T. Foote,et al.  "Content-Based Retrieval of Music and Audio," Multimedia Storage and Archiving System II , 1997 .

[31]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..