Ensemble Discriminant Sparse Projections Applied to Music Genre Classification

Resorting to the rich, psycho-physiologically grounded, properties of the slow temporal modulations of music recordings, a novel classifier ensemble is built, which applies discriminant sparse projections. More specifically, over complete dictionaries are learned and sparse coefficient vectors are extracted to optimally approximate the slow temporal modulations of the training music recordings. The sparse coefficient vectors are then projected to the principal subspaces of their within-class and between-class covariance matrices. Decisions are taken with respect to the minimum Euclidean distance from the class mean sparse coefficient vectors, which undergo the aforementioned projections. The application of majority voting to the decisions taken by 10 individual classifiers, which are trained on the 10 training folds defined by stratified 10-fold cross-validation on the GTZAN dataset, yields a music genre classification accuracy of 84.96% on average. The latter exceeds by 2.46% the highest accuracy previously reported without employing any sparse representations.

[1]  Constantine Kotropoulos,et al.  Music Genre Classification: A Multilinear Approach , 2008, ISMIR.

[2]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[3]  Kuansan Wang,et al.  Auditory representations of acoustic signals , 1992, IEEE Trans. Inf. Theory.

[4]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[5]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[6]  Les E. Atlas,et al.  Modulation-scale analysis for content identification , 2004, IEEE Transactions on Signal Processing.

[7]  Constantine Kotropoulos,et al.  A tensor-based approach for automatic music genre classification , 2008, 2008 16th European Signal Processing Conference.

[8]  Andreas Rauber,et al.  Evaluation of Feature Extractors and Psycho-Acoustic Transformations for Music Genre Classification , 2005, ISMIR.

[9]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[11]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[12]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[13]  N. Scaringella,et al.  Automatic genre classification of music content: a survey , 2006, IEEE Signal Process. Mag..

[14]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[15]  Juyang Weng,et al.  Using Discriminant Eigenfeatures for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Douglas Eck,et al.  Aggregate features and ADABOOST for music classification , 2006, Machine Learning.

[17]  Constantine Kotropoulos,et al.  Music genre classification via sparse representations of auditory temporal modulations , 2009, 2009 17th European Signal Processing Conference.

[18]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[19]  Xiaogang Wang,et al.  Dual-space linear discriminant analysis for face recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[20]  Yannis Stylianou,et al.  Musical Genre Classification Using Nonnegative Matrix Factorization-Based Features , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.