Music Genre Recognition Using Deep Neural Networks and Transfer Learning

Music genre recognition is a very interesting area of research in the broad scope of music information retrieval and audio signal processing. In this work we propose a novel approach for music genre recognition using an ensemble of convolutional long short term memory based neural networks (CNN LSTM) and a transfer learning model. The neural network models are trained on a diverse set of spectral and rhythmic features whereas the transfer learning model was originally trained on the task of music tagging. We compare our system with a number of recently published works and show that our model outperforms them and achieves new state of the art results.

[1]  Maheshkumar H. Kolekar,et al.  Classification of fashion article images using convolutional neural networks , 2017, 2017 Fourth International Conference on Image Information Processing (ICIIP).

[2]  Meinard Müller,et al.  Efficient Index-Based Audio Matching , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Guojun Lu,et al.  Enhanced polyphonic music genre classification using high level features , 2009, 2009 IEEE International Conference on Signal and Image Processing Applications.

[4]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[6]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[7]  Bob L. Sturm The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use , 2013, ArXiv.

[8]  Grzegorz Gwardys,et al.  Deep Image Features in Music Information Retrieval , 2014 .

[9]  C. Harte,et al.  Detecting harmonic change in musical audio , 2006, AMCMM '06.

[10]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Lei Wang,et al.  Transfer Learning for Music Classification and Regression Tasks Using Artist Tags , 2020 .

[12]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[13]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[14]  Qi Tian,et al.  Musical genre classification using support vector machines , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15]  Abhishek Kumar,et al.  A Multilayer Perceptron based Ensemble Technique for Fine-grained Financial Sentiment Analysis , 2017, EMNLP.

[16]  Luiz S. Oliveira,et al.  Music genre recognition using spectrograms , 2011, 2011 18th International Conference on Systems, Signals and Image Processing.

[17]  Meinard Müller,et al.  Audio Matching via Chroma-Based Statistical Features , 2005, ISMIR.

[18]  N. Scaringella,et al.  Automatic genre classification of music content: a survey , 2006, IEEE Signal Process. Mag..

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Bob L. Sturm A Survey of Evaluation in Music Genre Recognition , 2012, Adaptive Multimedia Retrieval.

[21]  Constantine Kotropoulos,et al.  Music genre classification via Topology Preserving Non-Negative Tensor Factorization and sparse representations , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[24]  Constantine Kotropoulos,et al.  Music genre classification via sparse representations of auditory temporal modulations , 2009, 2009 17th European Signal Processing Conference.

[25]  Meinard Müller,et al.  Chroma Toolbox: Matlab Implementations for Extracting Variants of Chroma-Based Audio Features , 2011, ISMIR.

[26]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[27]  Ze-Nian Li,et al.  Audio feature reduction and analysis for automatic music genre classification , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[28]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29]  Florian Mouret,et al.  Music Feature Maps with Convolutional Neural Networks for Music Genre Classification , 2017, CBMI.

[30]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[31]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[32]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[33]  Peter Grosche,et al.  Cyclic tempogram—A mid-level tempo representation for musicsignals , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.