A tensor-based approach for automatic music genre classification

Most music genre classification techniques employ pattern recognition algorithms to classify feature vectors extracted from recordings into genres. An automatic music genre classification system using tensor representations is proposed, where each recording is represented by a feature matrix over time. Thus, a feature tensor is created by concatenating the feature matrices associated to the recordings. A novel algorithm for non-negative tensor factorization (NTF), which employs the Frobenius norm between an n-dimensional raw feature tensor and its decomposition into a sum of elementary rank-1 tensors, is developed. Moreover, a supervised NTF classifier is proposed. A variety of sound description features are extracted from recordings from the GTZAN dataset, covering 10 genre classes. NTF classifier performance is compared against multilayer perceptrons, support vector machines, and non-negative matrix factorization classifiers. On average, genre classification accuracy equal to 75% with a standard deviation of 1% is achieved. It is demonstrated that NTF classifiers outperform matrix-based ones.

[1]  Constantine Kotropoulos,et al.  Large scale musical instrument identification , 2007 .

[2]  Max Welling,et al.  Positive tensor factorization , 2001, Pattern Recognit. Lett..

[3]  Andrzej Cichocki,et al.  Non-Negative Tensor Factorization using Alpha and Beta Divergences , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Christoph Schnörr,et al.  Controlling Sparseness in Non-negative Tensor Factorization , 2006, ECCV.

[5]  Stan Z. Li,et al.  Learning spatially localized, parts-based representation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  Daniel P. W. Ellis,et al.  Support vector machine active learning for music retrieval , 2006, Multimedia Systems.

[7]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[8]  Andreas Rauber,et al.  Evaluation of Feature Extractors and Psycho-Acoustic Transformations for Music Genre Classification , 2005, ISMIR.

[9]  L. Lathauwer,et al.  Signal Processing based on Multilinear Algebra , 1997 .

[10]  Douglas Eck,et al.  Aggregate features and ADABOOST for music classification , 2006, Machine Learning.

[11]  Tamir Hazan,et al.  Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[12]  S. Sra Nonnegative Matrix Approximation: Algorithms and Applications , 2006 .

[13]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.

[14]  Wei-Ying Ma,et al.  Mining ratio rules via principal sparse non-negative matrix factorization , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[15]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[16]  Derry Fitzgerald,et al.  Sound Source Separation Using Shifted Non-Negative Tensor Factorisation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[17]  Gerhard Widmer,et al.  Improvements of Audio-Based Music Similarity and Genre Classificaton , 2005, ISMIR.

[18]  Isabelle Guyon,et al.  What Size Test Set Gives Good Error Rate Estimates? , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  T. Subba Rao,et al.  Classification, Parameter Estimation and State Estimation: An Engineering Approach Using MATLAB , 2004 .

[20]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .