A framework for evaluating multimodal music mood classification

This research proposes a framework for music mood classification that uses multiple and complementary information sources, namely, music audio, lyric text, and social tags associated with music pieces. This article presents the framework and a thorough evaluation of each of its components. Experimental results on a large data set of 18 mood categories show that combining lyrics and audio significantly outperformed systems using audio‐only features. Automatic feature selection techniques were further proved to have reduced feature space. In addition, the examination of learning curves shows that the hybrid systems using lyrics and audio needed fewer training samples and shorter audio clips to achieve the same or better classification accuracies than systems using lyrics or audio singularly. Last but not least, performance comparisons reveal the relative importance of audio and lyric features across mood categories.

[1]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[2]  Bin Cui,et al.  Intelligent Music Information Systems: Tools and Methodologies , 2007 .

[3]  François Pachet,et al.  Signal + Context = Better Classification , 2007, ISMIR.

[4]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[5]  Björn W. Schuller,et al.  Determination of Nonprototypical Valence and Arousal in Popular Music: Features and Performances , 2010, EURASIP J. Audio Speech Music. Process..

[6]  Tijl De Bie,et al.  Mining the Correlation between Lyrical and Audio Features and the Emergence of Mood , 2011, ISMIR.

[7]  Ichiro Fujinaga,et al.  Combining Features Extracted from Audio, Symbolic and Cultural Sources , 2008, ISMIR.

[8]  Joseph Rudman,et al.  The State of Authorship Attribution Studies: Some Problems and Solutions , 1997, Comput. Humanit..

[9]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[10]  A. Gabrielsson,et al.  The influence of musical structure on emotional expression. , 2001 .

[11]  George Tzanetakis,et al.  MARSYAS-0.2: A Case Study in Implementing Music Information Retrieval Systems , 2008 .

[12]  Tuomas Eerola,et al.  Semantic Computing of Moods Based on Tags in Social Media of Music , 2013, IEEE Transactions on Knowledge and Data Engineering.

[13]  Homer H. Chen,et al.  Music Emotion Recognition , 2011 .

[14]  J. Stephen Downie,et al.  Exploring Mood Metadata: Relationships with Genre, Artist and Usage Metadata , 2007, ISMIR.

[15]  Menno van Zaanen,et al.  Automatic Mood Classification Using TF*IDF Based on Lyrics , 2010, ISMIR.

[16]  Andreas Rauber,et al.  Rhyme and Style Features for Musical Genre Classification by Song Lyrics , 2008, ISMIR.

[17]  François Pachet,et al.  Improving Timbre Similarity : How high’s the sky ? , 2004 .

[18]  Dan Yang,et al.  Disambiguating Music Emotion Using Software Agents , 2004, ISMIR.

[19]  Yi-Hsuan Yang,et al.  Machine Recognition of Music Emotion: A Review , 2012, TIST.

[20]  Mert Bay,et al.  The 2007 MIREX Audio Mood Classification Task: Lessons Learned , 2008, ISMIR.

[21]  Tuomas Eerola,et al.  A Review of Music and Emotion Studies: Approaches, Emotion Models, and Stimuli , 2013 .

[22]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[23]  Jens Grivolla,et al.  Multimodal Music Mood Classification Using Audio and Lyrics , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[24]  Tuomas Eerola,et al.  DOMAIN-SPECIFIC OR NOT? THE APPLICABILITY OF DIFFERENT EMOTION MODELS IN THE ASSESSMENT OF MUSIC-INDUCED EMOTIONS , 2008 .

[25]  György Fazekas,et al.  Selection of Audio Features for Music Emotion Recognition Using Production Music , 2014, Semantic Audio.

[26]  Meinard Müller,et al.  Lyrics-Based Audio Retrieval and Multimodal Navigation in Music Collections , 2007, ECDL.

[27]  Henry Lieberman,et al.  A model of textual affect sensing using real-world knowledge , 2003, IUI '03.

[28]  Yading Song,et al.  Evaluation of Musical Features for Emotion Classification , 2012, ISMIR.

[29]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[30]  Josef Kittler,et al.  Combining multiple classifiers by averaging or by multiplying? , 2000, Pattern Recognit..

[31]  Tuomas Eerola,et al.  Generalizability and Simplicity as Criteria in Feature Selection: Application to Mood Classification in Music , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[33]  Marko Grobelnik,et al.  Feature selection using linear classifier weights: interaction with classification models , 2004, SIGIR '04.

[34]  Fabio Vignoli,et al.  Digital Music Interaction Concepts: A User Study , 2004, ISMIR.

[35]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[36]  G. Widmer,et al.  EVALUATION OF FREQUENTLY USED AUDIO FEATURES FOR CLASSIFICATION OF MUSIC INTO PERCEPTUAL CATEGORIES , 2005 .

[37]  Paris Smaragdis,et al.  Combining Musical and Cultural Features for Intelligent Style Detection , 2002, ISMIR.

[38]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[39]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[40]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[41]  Andreas F. Ehmann,et al.  Lyric Text Mining in Music Mood Classification , 2009, ISMIR.

[42]  George Tzanetakis,et al.  MARSYAS SUBMISSIONS TO MIREX 2007 , 2007 .

[43]  Wolfgang Nejdl,et al.  Music Mood and Theme Classification - a Hybrid Approach , 2009, ISMIR.

[44]  Yajie Hu,et al.  Lyric-based Song Emotion Detection with Affective Lexicon and Fuzzy Clustering Method , 2009, ISMIR.

[45]  J. Stephen Downie,et al.  The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research , 2008, Acoustical Science and Technology.

[46]  Hui He,et al.  Language Feature Mining for Music Emotion Classification via Supervised Learning from Lyrics , 2008, ISICA.

[47]  By Bei,et al.  An Evaluation of Text Classification Methods for Literary Study , 2022 .

[48]  Bei Yu,et al.  An evaluation of text classification methods for literary study , 2008, Lit. Linguistic Comput..

[49]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[50]  Katia Kermanidis,et al.  Mood Classification Using Lyrics and Audio: A Case-Study in Greek Music , 2012, AIAI.

[51]  György Fazekas,et al.  Music Emotion Recognition: From Content- to Context-Based Models , 2012, CMMR.

[52]  M. Bradley,et al.  Affective Normsfor English Words (ANEW): Stimuli, instruction manual and affective ratings (Tech Report C-1) , 1999 .

[53]  Tao Li,et al.  Music artist style identification by semi-supervised learning from both lyrics and content , 2004, MULTIMEDIA '04.

[54]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[55]  Lie Lu,et al.  Automatic mood detection and tracking of music audio signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[56]  Wolfgang Nejdl,et al.  How do you feel about "dancing queen"?: deriving mood & theme annotations from user tags , 2009, JCDL '09.

[57]  J. Russell A circumplex model of affect. , 1980 .

[58]  J. Stephen Downie,et al.  "The Pain, the Pain": Modelling Music Information Behavior and the Songs We Hate , 2005, ISMIR.

[59]  Rajeswari Sridhar,et al.  LDA Based Emotion Recognition from Lyrics , 2014 .

[60]  Björn Schuller,et al.  ‘Mister D.J., Cheer Me Up!’: Musical and Textual Features for Automatic Mood Classification , 2010 .

[61]  Yi-Hsuan Yang,et al.  Toward Multi-modal Music Emotion Classification , 2008, PCM.

[62]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[63]  Pero Subasic,et al.  Affect analysis of text using fuzzy semantic typing , 2001, IEEE Trans. Fuzzy Syst..

[64]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[65]  J. Stephen Downie,et al.  Improving mood classification in music digital libraries by combining lyrics and audio , 2010, JCDL '10.

[66]  K. Scherer,et al.  Emotions evoked by the sound of music: characterization, classification, and measurement. , 2008, Emotion.

[67]  Jeffrey J. Scott,et al.  MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEW , 2010 .

[68]  Alessandro L. Koerich,et al.  The Latin Music Database , 2008, ISMIR.

[69]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[70]  Peter Knees,et al.  The Quest for Ground Truth in Musical Artist Tagging in the Social Web Era , 2007, ISMIR.

[71]  K. Hevner Experimental studies of the elements of expression in music , 1936 .

[72]  Xiao Hu,et al.  Music and Mood: Where Theory and Reality Meet , 2010 .

[73]  J. Stephen Downie,et al.  When Lyrics Outperform Audio for Music Mood Classification: A Feature Analysis , 2010, ISMIR.