Music Genre Classification Using Stacked Auto-Encoders

In this paper, we propose an architecture based on a stacked auto-encoder (SAE) for the classification of music genre. Each level in the stacked architecture works by stacking some hidden representations resulting from the previous level and related to different frames of the input signal. In this way, the proposed architecture shows a more robust classification compared to a standard SAE. The input to the first level of the SAE is fed by a set of 57 peculiar features extracted from the music signals. Some experimental results show the effectiveness of the proposed approach with respect to other state-of-the-art methods. In particular, the proposed architecture is compared to the support vector machine (SVM), multi-layer perceptron (MLP) and logistic regression (LR).

[1]  Daniel P. W. Ellis,et al.  Song-Level Features and Support Vector Machines for Music Classification , 2005, ISMIR.

[2]  Jozef Vavrek,et al.  Broadcast news audio classification using SVM binary trees , 2012, 2012 35th International Conference on Telecommunications and Signal Processing (TSP).

[3]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[4]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[5]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[6]  Carlos Dias Maciel,et al.  Exploring different approaches for music genre classification , 2012 .

[7]  Mohan S. Kankanhalli,et al.  Unsupervised classification of music genre using hidden Markov model , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[8]  Zhouyu Fu,et al.  A Survey of Audio-Based Music Classification and Annotation , 2011, IEEE Transactions on Multimedia.

[9]  Werner Verhelst,et al.  A Speech/Music/Silence/Garbage/ Classifier for Searching and Indexing Broadcast News Material , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.

[10]  Danilo Comminiello,et al.  Music classification using extreme learning machines , 2013, 2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA).

[11]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[12]  György Fazekas,et al.  A Tutorial on Deep Learning for Music Information Retrieval , 2017, ArXiv.

[13]  Eduardo Lleida,et al.  Audio segmentation-by-classification approach based on factor analysis in broadcast news domain , 2014, EURASIP J. Audio Speech Music. Process..

[14]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[15]  Alessandro Lameiras Koerich,et al.  Automatic music genre classification using ensemble of classifiers , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[16]  N. Scaringella,et al.  Automatic genre classification of music content: a survey , 2006, IEEE Signal Process. Mag..

[17]  Katharina Morik,et al.  Automatic Feature Extraction for Classifying Audio Data , 2005, Machine Learning.

[18]  Sivaji Bandyopadhyay,et al.  Music Genre Classification: A Semi-supervised Approach , 2013, MCPR.

[19]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[20]  Gerhard Widmer,et al.  Improvements of Audio-Based Music Similarity and Genre Classificaton , 2005, ISMIR.