The power of words: Enhancing music mood estimation with textual input of lyrics

Music mood estimation (MME) is a key technology in mood-based music recommendation. While mainstream MME research nowadays relies on audio music analysis, exploring the significance of lyrics text in predicting song emotion is gaining attention in recent years. One major impediment to MME research is the lack of a clearly labeled and publicly available dataset annotating the emotion ratings of lyrics text and audio separately. In light of this, we compiled a dataset of 600 pop songs (iPop) from the mood ratings of 246 participants who experienced three different song sessions, lyrics text (L), audio music track (M), and the combination of lyrics text and audio music track (C). We then applied statistical analysis to estimate how lyrics text and audio contribute to a song's overall valence-arousal (V-A) mood ratings. Our results show that lyrics text are not only a valid measure for estimating a song's mood ratings but also provide supplementary information that can improve audio-only MME systems. Furthermore, a detailed examination suggests that lyrics text (L) ratings are better estimators of the overall mood ratings of a song (C) in cases where L and M ratings conflict. We then construct a MME system that employs both features extracted from lyrics text and audio music track and validate the conclusions acquired in our statistical analysis. In estimating either V or A rating, the model with lyrics text plus audio track features performs better than only the model with only lyrics text or audio track features. These results validate the statement acquired by the statistical analysis.

[1]  Tao Li,et al.  Detecting emotion in music , 2003, ISMIR.

[2]  Hsin-Hsi Chen,et al.  Emotion Classification of Online News Articles from the Reader's Perspective , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[3]  W. Thompson,et al.  A Cross-Cultural Investigation of the Perception of Emotion in Music: Psychophysical and Cultural Cues , 1999 .

[4]  Òscar Celma Herrada Music recommendation and discovery in the long tail , 2009 .

[5]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[6]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[7]  Yi-Hsuan Yang,et al.  A Regression Approach to Music Emotion Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Zhang Naiyao,et al.  User-adaptive music emotion recognition , 2004, Proceedings 7th International Conference on Signal Processing, 2004. Proceedings. ICSP '04. 2004..

[9]  C. Fellbaum An Electronic Lexical Database , 1998 .

[10]  U. Ott,et al.  Using music to induce emotions: Influences of musical preference and absorption , 2008 .

[11]  J. Davenport Editor , 1960 .

[12]  Gert R. G. Lanckriet,et al.  Towards musical query-by-semantic-description using the CAL500 data set , 2007, SIGIR.

[13]  S. O. Ali,et al.  Songs and emotions: are lyrics and melodies equal partners? , 2006 .

[14]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[15]  J. Russell Affective space is bipolar. , 1979 .

[16]  Daniel C. Wu,et al.  MUSIC AND LYRICS: CAN LYRICS IMPROVE EMOTION ESTIMATION FOR MUSIC? , 2008 .

[17]  J. Sloboda,et al.  Music and emotion: Theory and research , 2001 .

[18]  Yi-Hsuan Yang,et al.  Toward Multi-modal Music Emotion Classification , 2008, PCM.

[19]  I. Peretz,et al.  Universal Recognition of Three Basic Emotions in Music , 2009, Current Biology.

[20]  Lie Lu,et al.  Automatic mood detection from acoustic music data , 2003, ISMIR.

[21]  P. Ekman An argument for basic emotions , 1992 .