Lyrics-Based Music Genre Classification Using a Hierarchical Attention Network

Music genre classification, especially using lyrics alone, remains a challenging topic in Music Information Retrieval. In this study we apply recurrent neural network models to classify a large dataset of intact song lyrics. As lyrics exhibit a hierarchical layer structure - in which words combine to form lines, lines form segments, and segments form a complete song - we adapt a hierarchical attention network (HAN) to exploit these layers and in addition learn the importance of the words, lines, and segments. We test the model over a 117-genre dataset and a reduced 20-genre dataset. Experimental results show that the HAN outperforms both non-neural models and simpler neural models, whilst also classifying over a higher number of genres than previous research. Through the learning process we can also visualise which words or lines in a song the model believes are important to classifying the genre. As a result the HAN provides insights, from a computational perspective, into lyrical structure and language features that differentiate musical genres.

[1]  Jordan B. L. Smith,et al.  Evaluating the Genre Classification Performance of Lyrical Features Relative to Audio, Symbolic and Cultural Features , 2010, ISMIR.

[2]  Ichiro Fujinaga,et al.  Musical genre classification: Is it worth pursuing and how can it be improved? , 2006, ISMIR.

[3]  Òscar Celma,et al.  The Quest for Musical Genres: Do the Experts and the Wisdom of Crowds Agree? , 2008, ISMIR.

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  Michael Fell,et al.  Lyrics-based Analysis and Classification of Music , 2014, COLING.

[6]  Bohyung Han,et al.  Progressive Attention Networks for Visual Attribute Prediction , 2016, BMVC.

[7]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8]  Markus Schedl,et al.  Timbral modeling for music artist recognition using i-vectors , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[9]  Mark B. Sandler,et al.  Automatic Tagging Using Deep Convolutional Neural Networks , 2016, ISMIR.

[10]  Ye Wang,et al.  Quantifying Lexical Novelty in Song Lyrics , 2015, ISMIR.

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Jack Atherton,et al.  I Said it First: Topological Analysis of Lyrical Influence Networks , 2016, ISMIR.

[13]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[14]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  Gerhard Widmer,et al.  Live Score Following on Sheet Music Images , 2016, ArXiv.

[18]  Jason Hockman,et al.  Automatic Drum Transcription Using Bi-Directional Recurrent Neural Networks , 2016, ISMIR.

[19]  Bo Yu,et al.  A new hierarchical method for music genre classification , 2016, 2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI).

[20]  Antoni B. Chan,et al.  Automatic Musical Pattern Feature Extraction Using Convolutional Neural Network , 2010 .

[21]  Kyogu Lee,et al.  Learning Temporal Features Using a Deep Neural Network and its Application to Music Genre Classification , 2016, ISMIR.

[22]  Luiz Eduardo Soares de Oliveira,et al.  An evaluation of Convolutional Neural Networks for music classification using spectrograms , 2017, Appl. Soft Comput..

[23]  Simon Dixon,et al.  Improved music feature learning with deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Shyamala C. Doraisamy,et al.  Genre and mood classification using lyric features , 2012, 2012 International Conference on Information Retrieval & Knowledge Management.

[25]  J. Stephen Downie,et al.  Improving mood classification in music digital libraries by combining lyrics and audio , 2010, JCDL '10.

[26]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[27]  Sangeun Kum,et al.  Melody Extraction on Vocal Segments Using Multi-Column Deep Neural Networks , 2016, ISMIR.

[28]  Andreas Rauber,et al.  Combination of audio and lyrics features for genre classification in digital audio collections , 2008, ACM Multimedia.

[29]  Vincent Lostanlen,et al.  Deep Convolutional Networks on the Pitch Spiral For Music Instrument Recognition , 2016, ISMIR.

[30]  Jeroen Breebaart,et al.  Features for audio and music classification , 2003, ISMIR.

[31]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[32]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[33]  Juhan Nam,et al.  Learning Sparse Feature Representations for Music Annotation and Retrieval , 2012, ISMIR.

[34]  Andreas Rauber,et al.  Music Genre Classification by Ensembles of Audio and Lyrics Features , 2011, ISMIR.

[35]  Alexandra L. Uitdenbogerd,et al.  In Your Eyes: Identifying Clichés in Song Lyrics , 2012, ALTA.

[36]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[37]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.