An evaluation of Convolutional Neural Networks for music classification using spectrograms

Graphical abstractDisplay Omitted HighlightsMusic classification using spectrograms and Convolutional Neural Networks.Compare results with state of the art in Latin Music Database, ISMIR 2004 and African music collection.Assessing complementarity between Convolutional Neural Networks and classifiers built with hand-crafted features. Music genre recognition based on visual representation has been successfully explored over the last years. Classifiers trained with textural descriptors (e.g., Local Binary Patterns, Local Phase Quantization, and Gabor filters) extracted from the spectrograms have achieved state-of-the-art results on several music datasets. In this work, though, we argue that we can go further with the time-frequency analysis through the use of representation learning. To show that, we compare the results obtained with a Convolutional Neural Network (CNN) with the results obtained by using handcrafted features and SVM classifiers. In addition, we have performed experiments fusing the results obtained with learned features and handcrafted features to assess the complementarity between these representations for the music classification task. Experiments were conducted on three music databases with distinct characteristics, specifically a western music collection largely used in research benchmarks (ISMIR 2004 Database), a collection of Latin American music (LMD database), and a collection of field recordings of ethnic African music. Our experiments show that the CNN compares favorably to other classifiers in several scenarios, hence, it is a very interesting alternative for music genre recognition. Considering the African database, the CNN surpassed the handcrafted representations and also the state-of-the-art by a margin. In the case of the LMD database, the combination of CNN and Robust Local Binary Pattern achieved a recognition rate of 92%, which to the best of our knowledge, is the best result (using an artist filter) on this dataset so far. On the ISMIR 2004 dataset, although the CNN did not improve the state of the art, it performed better than the classifiers based individually on other kind of features.

[1]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2]  J. Stephen Downie,et al.  K-Pop Genres: A Cross-Cultural Exploration , 2013, ISMIR.

[3]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[4]  Andreas Rauber,et al.  On the suitability of state-ofthe-art music information retrieval methods for analyzing , categorizing and accessing non-Western and ethnic music collections F , 2009 .

[5]  Andreas Nürnberger,et al.  Adaptive music retrieval–a state of the art , 2013, Multimedia Tools and Applications.

[6]  Francesc Alías,et al.  Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech Audio Classification , 2012, IEEE Transactions on Multimedia.

[7]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Anne H. H. Ngu,et al.  Towards Effective Content-Based Music Retrieval With Multiple Acoustic Feature Combination , 2006, IEEE Transactions on Multimedia.

[9]  Luiz Eduardo Soares de Oliveira,et al.  Forest Species Recognition Using Deep Convolutional Neural Networks , 2014, 2014 22nd International Conference on Pattern Recognition.

[10]  Antoni B. Chan,et al.  Automatic Musical Pattern Feature Extraction Using Convolutional Neural Network , 2010 .

[11]  Yann LeCun,et al.  Moving Beyond Feature Design: Deep Architectures and Automatic Feature Learning in Music Informatics , 2012, ISMIR.

[12]  Luiz S. Oliveira,et al.  Music genre recognition using spectrograms , 2011, 2011 18th International Conference on Systems, Signals and Image Processing.

[13]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[14]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[15]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[16]  Luiz Eduardo Soares de Oliveira,et al.  Selection of Training Instances for Music Genre Classification , 2010, 2010 20th International Conference on Pattern Recognition.

[17]  Alessandro Lameiras Koerich,et al.  Music genre recognition based on visual features with dynamic ensemble of classifiers selection , 2013, 2013 20th International Conference on Systems, Signals and Image Processing (IWSSIP).

[18]  Jan Schlüter,et al.  Musical Onset Detection with Convolutional Neural Networks , 2013 .

[19]  Andreas Rauber,et al.  Automatically Analyzing and Organizing Music Archives , 2001, ECDL.

[20]  Changshui Zhang,et al.  Content-Based Information Fusion for Semi-Supervised Music Genre Classification , 2008, IEEE Transactions on Multimedia.

[21]  Alessandro L. Koerich,et al.  The Latin Music Database , 2008, ISMIR.

[22]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[23]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[24]  William M. Hartmann,et al.  Psychoacoustics: Facts and Models , 2001 .

[25]  Xavier Serra,et al.  ISMIR 2004 Audio Description Contest , 2006 .

[26]  Arthur Flexer,et al.  A Closer Look on Artist Filters for Musical Genre Classification , 2007, ISMIR.

[27]  Luiz Eduardo Soares de Oliveira,et al.  A database for automatic classification of forest species , 2012, Machine Vision and Applications.

[28]  Xin Zhang,et al.  Learning Music Embedding with Metadata for Context Aware Recommendation , 2016, ICMR.

[29]  Constantine Kotropoulos,et al.  Music Genre Classification Using Locality Preserving Non-Negative Tensor Factorization and Sparse Representations , 2009, ISMIR.

[30]  Simon Dixon,et al.  Improved music feature learning with deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Lior Shamir,et al.  Computer analysis of similarities between albums in popular music , 2014, Pattern Recognit. Lett..

[32]  Yang Zhao,et al.  Completed robust local binary pattern for texture classification , 2013, Neurocomputing.

[33]  Andreas Rauber,et al.  A Cartesian Ensemble of Feature Subspace Classifiers for Music Categorization , 2010, ISMIR.

[34]  Jyh-Shing Roger Jang,et al.  Combining Acoustic and Multilevel Visual Features for Music Genre Classification , 2015, TOMM.

[35]  Juan Pablo Bello,et al.  Rethinking Automatic Chord Recognition with Convolutional Neural Networks , 2012, 2012 11th International Conference on Machine Learning and Applications.

[36]  Tao Feng,et al.  Deep learning for music genre classification , 2014 .

[37]  Loris Nanni Set of Texture Descriptors for Music Genre Classification , 2014 .

[38]  Tetsuya Takiguchi,et al.  Local-feature-map Integration Using Convolutional Neural Networks for Music Genre Classification , 2012, INTERSPEECH.

[39]  Elias Pampalk,et al.  Please Scroll down for Article Journal of New Music Research the Som-enhanced Jukebox: Organization and Visualization of Music Collections Based on Perceptual Models , 2022 .

[40]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Douglas Eck,et al.  Temporal Pooling and Multiscale Learning for Automatic Annotation and Ranking of Music Audio , 2011, ISMIR.

[42]  Luiz Eduardo Soares de Oliveira,et al.  Music Genre Recognition Using Gabor Filters and LPQ Texture Descriptors , 2013, CIARP.

[43]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Douglas Eck,et al.  Learning Features from Music Audio with Deep Belief Networks , 2010, ISMIR.

[45]  Alessandro Lameiras Koerich,et al.  Automatic music genre classification using ensemble of classifiers , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[46]  Xuelong Li,et al.  EMIF: Towards a Scalable and Effective Indexing Framework for Large Scale Music Retrieval , 2015, ICMR.

[47]  Grzegorz Gwardys,et al.  Deep Image Features in Music Information Retrieval , 2014 .

[48]  Alessandro Lameiras Koerich,et al.  Automatic classification of audio data , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[49]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[50]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[51]  Seok-Pil Lee,et al.  Music-genre classification system based on spectro-temporal features and feature selection , 2012, IEEE Transactions on Consumer Electronics.

[52]  Loris Nanni,et al.  Combining visual and acoustic features for music genre classification , 2016, Expert Syst. Appl..

[53]  Luiz Eduardo Soares de Oliveira,et al.  Music genre classification using LBP textural features , 2012, Signal Process..

[54]  Andreas Rauber,et al.  Facilitating Comprehensive Benchmarking Experiments on the Million Song Dataset , 2012, ISMIR.

[55]  Xavier Serra,et al.  Multimodal Deep Learning for Music Genre Classification , 2018, Trans. Int. Soc. Music. Inf. Retr..

[56]  Ching Y. Suen,et al.  A novel hybrid CNN-SVM classifier for recognizing handwritten digits , 2012, Pattern Recognit..

[57]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.