‘Mister D.J., Cheer Me Up!’: Musical and Textual Features for Automatic Mood Classification

Abstract Mass consumption of large collections of digital music asks for efficient and intuitive ways of organization. In this article, a system is presented which recognizes the evoked music mood on the basis of a wide variety of features, closely sticking to real world conditions. A two-dimensional mood model is discussed in which moods resemble binary values for arousal and valence and an easy and thus user-friendly method is presented through which a fuzzy seven-class mood cluster is deducted. The songs of the ‘Twenty Years of MTV Europe Most Wanted’ music database consisting of recorded pop music tracks serve for evaluation of three groups of features: firstly, traditional features such as rhythm and tonal features, zero crossing rate, cepstral, and MPEG-7 Low Level Descriptors for audio content are extracted. Secondly, lyrics, chord sequences, and genre data are obtained from on-line sources. Thirdly, from all these, the high-level features, musical mode, and as a novel feature, the suited ballroom dance style, are created automatically. The features selected are data-driven, and Support Vector Machines are used for classification. Prediction accuracies of 77.4% for arousal and 72.9% for valence as well as 71.8% (including neighbours) for the seven-class cluster model are obtained pre-serving realism in terms of non-prototypical music selection and feature extraction throughout.

[1]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[2]  P. Laukka,et al.  Expression, Perception, and Induction of Musical Emotions: A Review and a Questionnaire Study of Everyday Listening , 2004 .

[3]  G. H. Wakefield,et al.  To catch a chorus: using chroma-based representations for audio thumbnailing , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[4]  E. Schellenberg,et al.  Effects of Musical Tempo and Mode on Arousal, Mood, and Spatial Abilities , 2002 .

[5]  L. Lamel,et al.  Emotion detection in task-oriented spoken dialogues , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[6]  A. Gabrielsson Emotion perceived and emotion felt: Same or different? , 2001 .

[7]  Nicole Novielli,et al.  'You are Sooo Cool, Valentina!' Recognizing Social Attitude in Speech-Based Dialogues with an ECA , 2007, ACII.

[8]  Özgür,et al.  AN ALGORITHM FOR AUDIO KEY FINDING , 2005 .

[9]  J. Stephen Downie,et al.  Exploring Mood Metadata: Relationships with Genre, Artist and Usage Metadata , 2007, ISMIR.

[10]  Emery Schubert Update of the Hevner Adjective Checklist , 2003, Perceptual and motor skills.

[11]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[12]  Yi-Hsuan Yang,et al.  Automatic chord recognition for music classification and retrieval , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[13]  Mert Bay,et al.  Creating a Simplified Music Mood Classification Ground-Truth Set , 2007, ISMIR.

[14]  George Tzanetakis,et al.  Automatic Musical Genre Classification of Audio Signals , 2001, ISMIR.

[15]  G. Widmer,et al.  EVALUATION OF FREQUENTLY USED AUDIO FEATURES FOR CLASSIFICATION OF MUSIC INTO PERCEPTUAL CATEGORIES , 2005 .

[16]  Malcolm Slaney,et al.  Acoustic Chord Transcription and Key Extraction From Audio Using Key-Dependent HMMs Trained on Synthesized Audio , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Eric D. Scheirer,et al.  Tempo and beat analysis of acoustic musical signals. , 1998, The Journal of the Acoustical Society of America.

[18]  Roddy Cowie,et al.  What a neural net needs to know about emotion words , 1999 .

[19]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[20]  Klaus R. Scherer,et al.  Emotion expression in speech and music , 1991 .

[21]  Björn W. Schuller,et al.  Data-driven clustering in emotional space for affect recognition using discriminatively trained LSTM networks , 2009, INTERSPEECH.

[22]  Diane J. Litman,et al.  Recognizing emotions from student speech in tutoring dialogues , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[23]  Björn W. Schuller,et al.  Wearable Assistance for the Ballroom-Dance Hobbyist - Holistic Rhythm Analysis and Dance-Style Classification , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[24]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[25]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[26]  Markus Koppenberger,et al.  Natural language processing of lyrics , 2005, ACM Multimedia.

[27]  Takuya Fujishima,et al.  Realtime Chord Recognition of Musical Sound: a System Using Common Lisp Music , 1999, ICMC.

[28]  K. Hevner Experimental studies of the elements of expression in music , 1936 .

[29]  Beth Logan,et al.  Semantic analysis of song lyrics , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[30]  Masataka Goto,et al.  A chorus section detection method for musical audio signals and its application to a music listening station , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[32]  B. Schuller,et al.  Matching Monophonic Audio Clips to Polyphonic Recordings , 2005 .

[33]  Lie Lu,et al.  Automatic mood detection from acoustic music data , 2003, ISMIR.

[34]  Björn Schuller,et al.  Speech Communication and Multimodal Interfaces , 2006 .

[35]  Shrikanth S. Narayanan,et al.  Politeness and frustration language in child-machine interactions , 2001, INTERSPEECH.

[36]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[37]  George Tzanetakis,et al.  MARSYAS: a framework for audio analysis , 1999, Organised Sound.

[38]  B. Repp,et al.  Is Recognition of Emotion in Music Performance an Aspect of Emotional Intelligence , 2004 .

[39]  J. Russell A circumplex model of affect. , 1980 .

[40]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[41]  Anthony C. Boucouvalas,et al.  Text-to-Emotion Engine for Real Time Internet Communication , 2002 .

[42]  Gregory D. Webster,et al.  Emotional Responses to Music: Interactive Effects of Mode, Texture, and Tempo , 2005 .

[43]  Zofia Kaminska,et al.  Melodic Line and Emotion: Cooke's Theory Revisited , 2000 .

[44]  Yi-Hsuan Yang,et al.  Toward Multi-modal Music Emotion Classification , 2008, PCM.

[45]  T. Kemp,et al.  Mood-based navigation through large collections of musical data , 2005, Second IEEE Consumer Communications and Networking Conference, 2005. CCNC. 2005.

[46]  Elaine Chew,et al.  The Spiral Array: An Algorithm for Determining Key Boundaries , 2002, ICMAI.

[47]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[48]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[49]  Dan Yang,et al.  Disambiguating Music Emotion Using Software Agents , 2004, ISMIR.

[50]  Adrian C. North,et al.  The social psychology of music. , 1997 .

[51]  Björn W. Schuller,et al.  Fast and Robust Meter and Tempo Recognition for the Automatic Discrimination of Ballroom Dance Styles , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[52]  Marc Leman,et al.  Using audio features to model the affective response to music , 2004 .

[53]  M. Rigg The Mood Effects of Music: A Comparison of Data from Four Investigators , 1964 .

[54]  Björn W. Schuller,et al.  Applying Bayesian belief networks in approximate string matching for robust keyword-based retrieval , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[55]  W. Dowling Emotion and Meaning in Music , 2008 .

[56]  Frank Kurth,et al.  Filter bank tree and M-band wavelet packet algorithms in audio signal processing , 1999, IEEE Trans. Signal Process..

[57]  Daniel P. W. Ellis,et al.  Please Scroll down for Article Journal of New Music Research a Web-based Game for Collecting Music Metadata a Web-based Game for Collecting Music Metadata , 2022 .

[58]  Shyh-Kang Jeng,et al.  Probabilistic Estimation of a Novel Music Emotion Model , 2008, MMM.

[59]  Zhang Naiyao,et al.  User-adaptive music emotion recognition , 2004, Proceedings 7th International Conference on Signal Processing, 2004. Proceedings. ICSP '04. 2004..

[60]  P. Juslin,et al.  Cue Utilization in Communication of Emotion in Music Performance: Relating Performance to Perception Studies of Music Performance , 2022 .

[61]  W. Thompson,et al.  A Cross-Cultural Investigation of the Perception of Emotion in Music: Psychophysical and Cultural Cues , 1999 .

[62]  François Pachet,et al.  Automatic extraction of music descriptors from acoustic signals , 2004, ISMIR.

[63]  R. Thayer The biopsychology of mood and arousal , 1989 .

[64]  Björn W. Schuller,et al.  Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles , 2005, INTERSPEECH.

[65]  Meinard Müller,et al.  Towards Structural Analysis of Audio Recordings in the Presence of Musical Variations , 2007, EURASIP J. Adv. Signal Process..

[66]  J. Sloboda Music Structure and Emotional Response: Some Empirical Findings , 1991 .

[67]  Lie Lu,et al.  Automatic mood detection and tracking of music audio signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[68]  Andreja Andric,et al.  Automatic playlist generation based on tracking user’s listening habits , 2006, Multimedia Tools and Applications.

[69]  J. Sloboda,et al.  Psychological perspectives on music and emotion , 2001 .

[70]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[71]  Alex Waibel,et al.  EMOTION-SENSITIVE HUMAN-COMPUTER INTERFACES , 2000 .

[72]  Björn W. Schuller,et al.  Audio chord labeling by musiological modeling and beat-synchronization , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[73]  William M. Pottenger,et al.  Posting Act Tagging Using Transformation-Based Learning , 2005, Foundations of Data Mining and knowledge Discovery.

[74]  Yi-Hsuan Yang,et al.  Detecting and Classifying Emotion in Popular Music , 2006, JCIS.

[75]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[76]  Michael I. Mandel,et al.  AUDIO MUSIC MOOD CLASSIFICATION USING SUPPORT VECTOR MACHINE , 2007 .

[77]  P. Farnsworth A STUDY OF THE HEVNER ADJECTIVE LIST , 1954 .

[78]  Daniel P. W. Ellis,et al.  Locating singing voice segments within music signals , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[79]  Piotr Synak,et al.  Extracting Emotions from Music Data , 2005, ISMIS.

[80]  Kathleen Pichora-Fuller,et al.  Use of lexical and affective prosodic cues to emotion by younger and older adults , 2007, INTERSPEECH.

[81]  Björn W. Schuller,et al.  Musical Signal Type Discrimination based on Large Open Feature Sets , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[82]  Barry Vercoe,et al.  Detection of Key Change in Classical Piano Music , 2005, ISMIR.

[83]  Lie Lu,et al.  Audio tonality mode classification without tonic annotations , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[84]  George Tzanetakis,et al.  MARSYAS SUBMISSIONS TO MIREX 2007 , 2007 .

[85]  C. Elliott The affective reasoner: a process model of emotions in a multi-agent system , 1992 .

[86]  Jens Grivolla,et al.  Multimodal Music Mood Classification Using Audio and Lyrics , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[87]  Yi-Hsuan Yang,et al.  A Regression Approach to Music Emotion Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[88]  Björn W. Schuller,et al.  Tango or Waltz?: Putting Ballroom Dance Style into Tempo Detection , 2008, EURASIP J. Audio Speech Music. Process..

[89]  Shrikanth S. Narayanan,et al.  Combining acoustic and language information for emotion recognition , 2002, INTERSPEECH.

[90]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[91]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[92]  Emmanuel Dellandréa,et al.  What is the best segment duration for music mood analysis ? , 2008, 2008 International Workshop on Content-Based Multimedia Indexing.

[93]  Geoffroy Peeters,et al.  MUSICAL KEY ESTIMATION OF AUDIO SIGNAL BASED ON HIDDEN MARKOV MODELING OF CHROMA VECTORS , 2006 .

[94]  Janto Skowronek,et al.  Ground truth for automatic music mood classification , 2006, ISMIR.

[95]  A. Gabrielsson,et al.  The influence of musical structure on emotional expression. , 2001 .

[96]  Youngmoo E. Kim,et al.  MoodSwings: A Collaborative Game for Music Mood Label Collection , 2008, ISMIR.

[97]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[98]  C. Stevens,et al.  Sweet Anticipation: Music and the Psychology of Expectation, by David Huron . Cambridge, Massachusetts: MIT Press, 2006 , 2007 .

[99]  Emery Schubert Modeling Perceived Emotion With Continuous Musical Features , 2004 .

[100]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[101]  Kornel Laskowski,et al.  Combining Efforts for Improving Automatic Classification of Emotional User States , 2006 .

[102]  Daniel P. W. Ellis,et al.  LABROSA'S AUDIO MUSIC SIMILARITY AND CLASSIFICATION SUBMISSIONS , 2007 .

[103]  J. Breese,et al.  Modeling Emotional State and Personality for Conversational Agents , 1998 .

[104]  Björn W. Schuller,et al.  “The Godfather” vs. “Chaos”: Comparing Linguistic Analysis Based on On-line Knowledge Sources and Bags-of-N-Grams for Movie Review Valence Estimation , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[105]  Bruce A. Draper,et al.  Iterative Relief , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[106]  Seiji Inokuchi,et al.  Sentiment extraction in music , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[107]  Chung-Hsien Wu,et al.  Emotion recognition using acoustic features and textual content , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[108]  Björn W. Schuller,et al.  Recognition of interest in human conversational speech , 2006, INTERSPEECH.

[109]  Yueting Zhuang,et al.  Popular music retrieval by detecting mood , 2003, SIGIR.

[110]  Perfecto Herrera,et al.  Audio music mood classification using support vector machine , 2007 .

[111]  Yoichi Muraoka,et al.  An Audio-based Real-time Beat Tracking System and Its Applications , 1998, ICMC.

[112]  Björn Schuller,et al.  One Day in Half an Hour: Music Thumbnailing Incorporating Harmony- and Rhythm Structure , 2008, AMR 2008.

[113]  Tao Li,et al.  Detecting emotion in music , 2003, ISMIR.

[114]  Henry Lieberman,et al.  A model of textual affect sensing using real-world knowledge , 2003, IUI '03.

[115]  Yi-Hsuan Yang,et al.  Exploiting genre for music emotion classification , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[116]  Mohan S. Kankanhalli,et al.  Music Key Detection for Musical Audio , 2005, 11th International Multimedia Modelling Conference.