论文信息 - Music Genre Recognition Using Deep Neural Networks and Transfer Learning

Music Genre Recognition Using Deep Neural Networks and Transfer Learning

Music genre recognition is a very interesting area of research in the broad scope of music information retrieval and audio signal processing. In this work we propose a novel approach for music genre recognition using an ensemble of convolutional long short term memory based neural networks (CNN LSTM) and a transfer learning model. The neural network models are trained on a diverse set of spectral and rhythmic features whereas the transfer learning model was originally trained on the task of music tagging. We compare our system with a number of recently published works and show that our model outperforms them and achieves new state of the art results.

Maheshkumar H. Kolekar | Deepanway Ghosal | M. Kolekar | Deepanway Ghosal

[1] Maheshkumar H. Kolekar,et al. Classification of fashion article images using convolutional neural networks , 2017, 2017 Fourth International Conference on Image Information Processing (ICIIP).

[2] Meinard Müller,et al. Efficient Index-Based Audio Matching , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[3] Guojun Lu,et al. Enhanced polyphonic music genre classification using high level features , 2009, 2009 IEEE International Conference on Signal and Image Processing Applications.

[4] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Judith C. Brown. Calculation of a constant Q spectral transform , 1991 .

[6] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[7] Bob L. Sturm. The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use , 2013, ArXiv.

[8] Grzegorz Gwardys,et al. Deep Image Features in Music Information Retrieval , 2014 .

[9] C. Harte,et al. Detecting harmonic change in musical audio , 2006, AMCMM '06.

[10] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Lei Wang,et al. Transfer Learning for Music Classification and Regression Tasks Using Artist Tags , 2020 .

[12] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[13] George Tzanetakis,et al. Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[14] Qi Tian,et al. Musical genre classification using support vector machines , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15] Abhishek Kumar,et al. A Multilayer Perceptron based Ensemble Technique for Fine-grained Financial Sentiment Analysis , 2017, EMNLP.

[16] Luiz S. Oliveira,et al. Music genre recognition using spectrograms , 2011, 2011 18th International Conference on Systems, Signals and Image Processing.

[17] Meinard Müller,et al. Audio Matching via Chroma-Based Statistical Features , 2005, ISMIR.

[18] N. Scaringella,et al. Automatic genre classification of music content: a survey , 2006, IEEE Signal Process. Mag..

[19] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20] Bob L. Sturm. A Survey of Evaluation in Music Genre Recognition , 2012, Adaptive Multimedia Retrieval.

[21] Constantine Kotropoulos,et al. Music genre classification via Topology Preserving Non-Negative Tensor Factorization and sparse representations , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[24] Constantine Kotropoulos,et al. Music genre classification via sparse representations of auditory temporal modulations , 2009, 2009 17th European Signal Processing Conference.

[25] Meinard Müller,et al. Chroma Toolbox: Matlab Implementations for Extracting Variants of Chroma-Based Audio Features , 2011, ISMIR.

[26] Thierry Bertin-Mahieux,et al. The Million Song Dataset , 2011, ISMIR.

[27] Ze-Nian Li,et al. Audio feature reduction and analysis for automatic music genre classification , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[28] Gerald Penn,et al. Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29] Florian Mouret,et al. Music Feature Maps with Convolutional Neural Networks for Music Genre Classification , 2017, CBMI.

[30] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[31] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[32] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[33] Peter Grosche,et al. Cyclic tempogram—A mid-level tempo representation for musicsignals , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.