Deep Neural Networks: A Case Study for Music Genre Classification

Music classification is a challenging problem with many applications in today's large-scale datasets with Gigabytes of music files and associated metadata and online streaming services. Recent success with deep neural network architectures on large-scale datasets has inspired numerous studies in the machine learning community for various pattern recognition and classification tasks such as automatic speech recognition, natural language processing, audio classification and computer vision. In this paper, we explore a two-layer neural network with manifold learning techniques for music genre classification. We compare the classification accuracy rate of deep neural networks with a set of well-known learning models including support vector machines (SVM and '1-SVM), logistic regression and '1-regression in combination with hand-crafted audio features for a genre classification task on a public dataset. Our experimental results show that neural networks are comparable with classic learning models when the data is represented in a rich feature space.

[1]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.

[2]  Jan Larsen,et al.  Improving music genre classification by short time feature integration , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Katharina Morik,et al.  Automatic Feature Extraction for Classifying Audio Data , 2005, Machine Learning.

[4]  Ali Shokoufandeh,et al.  Music genre classification using explicit semantic analysis , 2011, MIRUM '11.

[5]  Andreas Rauber,et al.  Improving Genre Classification by Combination of Audio and Symbolic Descriptors Using a Transcription Systems , 2007, ISMIR.

[6]  Zehra Cataltepe,et al.  Music Genre Classification Using MIDI and Audio Features , 2007, EURASIP J. Adv. Signal Process..

[7]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[8]  Ali Shokoufandeh,et al.  Automatic musical genre classification using sparsity-eager support vector machines , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[9]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[10]  N. Scaringella,et al.  Automatic genre classification of music content: a survey , 2006, IEEE Signal Process. Mag..

[11]  Antoni B. Chan,et al.  Genre Classification and the Invariance of MFCC Features to Key and Tempo , 2011, MMM.

[12]  Simon Dixon,et al.  Improved music feature learning with deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Kamelia Aryafar Music Genre Classification Using Sparsity-Eager Support Vector Machines , 2012 .

[14]  Ichiro Fujinaga,et al.  Musical genre classification: Is it worth pursuing and how can it be improved? , 2006, ISMIR.

[15]  Zhouyu Fu,et al.  A Survey of Audio-Based Music Classification and Annotation , 2011, IEEE Transactions on Multimedia.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[18]  Jiawei Han,et al.  Spectral Regression for Efficient Regularized Subspace Learning , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Paris Smaragdis,et al.  Combining Musical and Cultural Features for Intelligent Style Detection , 2002, ISMIR.

[20]  Ali Shokoufandeh,et al.  Multimodal Music and Lyrics Fusion Classifier for Artist Identification , 2014, 2014 13th International Conference on Machine Learning and Applications.

[21]  Lie Lu,et al.  Music type classification by spectral contrast feature , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[22]  Daniel P. W. Ellis,et al.  Classifying Music Audio with Timbral and Chroma Features , 2007, ISMIR.

[23]  Gerhard Widmer,et al.  Playlist Generation using Start and End Songs , 2008, ISMIR.

[24]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[25]  Daniel P. W. Ellis,et al.  Song-Level Features and Support Vector Machines for Music Classification , 2005, ISMIR.

[26]  Wietse Balkema,et al.  Music playlist generation by assimilating GMMs into SOMs , 2010, Pattern Recognit. Lett..

[27]  Ali Shokoufandeh,et al.  Fusion of Text and Audio Semantic Representations Through CCA , 2014, MPRSS.

[28]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[29]  Yi-Hsuan Yang,et al.  Music Emotion Classification: A Regression Approach , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[30]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[31]  Ichiro Fujinaga,et al.  Combining Features Extracted from Audio, Symbolic and Cultural Sources , 2008, ISMIR.

[32]  Katharina Morik,et al.  A Benchmark Dataset for Audio Classification and Clustering , 2005, ISMIR.

[33]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[34]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[35]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[36]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[37]  Andreas Rauber,et al.  Integration of Text and Audio Features for Genre Classification in Music Information Retrieval , 2007, ECIR.

[38]  Shaogang Gong,et al.  Appearance Manifold of Facial Expression , 2005, ICCV-HCI.

[39]  Kaamran Raahemifar,et al.  Content based audio classification and retrieval using joint time-frequency analysis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[40]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[41]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[42]  Simon Dixon,et al.  Improving Music Genre Classification Using Automatically Induced Harmony Rules , 2010 .

[43]  Douglas Eck,et al.  Learning Features from Music Audio with Deep Belief Networks , 2010, ISMIR.

[44]  Ali Shokoufandeh,et al.  Multimodal Sparsity-Eager Support Vector Machines for Music Classification , 2014, 2014 13th International Conference on Machine Learning and Applications.