Music Mood Detection Based on Audio and Lyrics with Deep Neural Net

We consider the task of multimodal music mood prediction based on the audio signal and the lyrics of a track. We reproduce the implementation of traditional feature engineering based approaches and propose a new model based on deep learning. We compare the performance of both approaches on a database containing 18,000 tracks with associated valence and arousal values and show that our approach outperforms classical models on the arousal detection task, and that both approaches perform equally on the valence prediction task. We also compare the a posteriori fusion with fusion of modalities optimized simultaneously with each unimodal model, and observe a significant improvement of valence prediction. We release part of our database for comparison purposes.

[1]  Tao Li,et al.  Detecting emotion in music , 2003, ISMIR.

[2]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[3]  Dongwon Kim,et al.  Music Emotion Recognition via End-to-End Multimodal Neural Networks , 2017, RecSys Posters.

[4]  George Tzanetakis MARSYAS SUBMISSIONS TO MIREX 2010 , 2010 .

[5]  I. Peretz,et al.  Singing in the Brain: Independence of Lyrics and Tunes , 1998 .

[6]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[7]  George Tzanetakis,et al.  MARSYAS SUBMISSIONS TO MIREX 2007 , 2007 .

[8]  Giacomo Mauro DAriano The Journal of Personality and Social Psychology. , 2002 .

[9]  Xing Wang,et al.  Music Emotion Classification of Chinese Songs based on Lyrics Using TF*IDF and Rhyme , 2011, ISMIR.

[10]  Nan Jiang,et al.  Bi-Modal Deep Boltzmann Machine Based Musical Emotion Classification , 2016, ICANN.

[11]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[12]  Jens Grivolla,et al.  Multimodal Music Mood Classification Using Audio and Lyrics , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[13]  Shlomo Argamon,et al.  Style mining of electronic messages for multiple authorship discrimination: first results , 2003, KDD '03.

[14]  Sebastian Böck,et al.  Improved musical onset detection with Convolutional Neural Networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[16]  T. Lidy Parallel Convolutional Neural Networks for Music Genre and Mood Classification , 2016 .

[17]  Feng Su,et al.  Graph-Based Multimodal Music Mood Classification in Discriminative Latent Space , 2017, MMM.

[18]  Thomas Fillon,et al.  YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software , 2010, ISMIR.

[19]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[20]  Amy Beth Warriner,et al.  Norms of valence, arousal, and dominance for 13,915 English lemmas , 2013, Behavior Research Methods.

[21]  George Trigeorgis,et al.  Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  K. Hevner Experimental studies of the elements of expression in music , 1936 .

[23]  J. Stephen Downie,et al.  The Music Information Retrieval Evaluation eXchange (MIREX) , 2006 .

[24]  J. Stephen Downie,et al.  When Lyrics Outperform Audio for Music Mood Classification: A Feature Analysis , 2010, ISMIR.

[25]  J. Russell A circumplex model of affect. , 1980 .

[26]  Yann LeCun,et al.  Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[27]  G. Peeters,et al.  A Generic Training and Classification System for MIREX08 Classification Tasks: Audio Music Mood, Audio Genre, Audio Artist and Audio Tag , 2008 .

[28]  J. Stephen Downie,et al.  A framework for evaluating multimodal music mood classification , 2017, J. Assoc. Inf. Sci. Technol..

[29]  Andreas F. Ehmann,et al.  Lyric Text Mining in Music Mood Classification , 2009, ISMIR.

[30]  Dan Yang,et al.  Disambiguating Music Emotion Using Software Agents , 2004, ISMIR.

[31]  Jeffrey J. Scott,et al.  MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEW , 2010 .

[32]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[33]  J. Stephen Downie,et al.  Improving mood classification in music digital libraries by combining lyrics and audio , 2010, JCDL '10.