Determination of Nonprototypical Valence and Arousal in Popular Music: Features and Performances

Mood of Music is among the most relevant and commercially promising, yet challenging attributes for retrieval in large music collections. In this respect this article first provides a short overview on methods and performances in the field. While most past research so far dealt with low-level audio descriptors to this aim, this article reports on results exploiting information on middle-level as the rhythmic and chordal structure or lyrics of a musical piece. Special attention is given to realism and nonprototypicality of the selected songs in the database: all feature information is obtained by fully automatic preclassification apart from the lyrics which are automatically retrieved from on-line sources. Further more, instead of exclusively picking songs with agreement of several annotators upon perceived mood, a full collection of 69 double CDs, or 2 648 titles, respectively, is processed. Due to the severity of this task; different modelling forms in the arousal and valence space are investigated, and relevance per feature group is reported.

[1]  K. Hevner Experimental studies of the elements of expression in music , 1936 .

[2]  J. Russell A circumplex model of affect. , 1980 .

[3]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[4]  J. Russell MEASURES OF EMOTION , 1989 .

[5]  R. Thayer The biopsychology of mood and arousal , 1989 .

[6]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[7]  Adrian C. North,et al.  The social psychology of music. , 1997 .

[8]  Eric D. Scheirer,et al.  Tempo and beat analysis of acoustic musical signals. , 1998, The Journal of the Acoustical Society of America.

[9]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[10]  Frank Kurth,et al.  Filter bank tree and M-band wavelet packet algorithms in audio signal processing , 1999, IEEE Trans. Signal Process..

[11]  M. Bradley,et al.  Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings , 1999 .

[12]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[13]  George Tzanetakis,et al.  MARSYAS: a framework for audio analysis , 1999, Organised Sound.

[14]  Ian Witten,et al.  Data Mining , 2000 .

[15]  G. H. Wakefield,et al.  To catch a chorus: using chroma-based representations for audio thumbnailing , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[16]  A. Gabrielsson Emotion perceived and emotion felt: Same or different? , 2001 .

[17]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[18]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[19]  Lie Lu,et al.  Automatic mood detection from acoustic music data , 2003, ISMIR.

[20]  Yueting Zhuang,et al.  Popular music retrieval by detecting mood , 2003, SIGIR.

[21]  Tao Li,et al.  Detecting emotion in music , 2003, ISMIR.

[22]  Chung-Hsien Wu,et al.  Emotion recognition using acoustic features and textual content , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[23]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[24]  Mark Sandler,et al.  Automatic Chord Identifcation using a Quantised Chromagram , 2005 .

[25]  T. Kemp,et al.  Mood-based navigation through large collections of musical data , 2005, Second IEEE Consumer Communications and Networking Conference, 2005. CCNC. 2005.

[26]  Masataka Goto,et al.  A chorus section detection method for musical audio signals and its application to a music listening station , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Lie Lu,et al.  Automatic mood detection and tracking of music audio signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Björn W. Schuller,et al.  Fast and Robust Meter and Tempo Recognition for the Automatic Discrimination of Ballroom Dance Styles , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[29]  Meinard Müller,et al.  Towards Structural Analysis of Audio Recordings in the Presence of Musical Variations , 2007, EURASIP J. Adv. Signal Process..

[30]  J. Stephen Downie,et al.  Exploring Mood Metadata: Relationships with Genre, Artist and Usage Metadata , 2007, ISMIR.

[31]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[32]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[33]  Mert Bay,et al.  The 2007 MIREX Audio Mood Classification Task: Lessons Learned , 2008, ISMIR.

[34]  G. Peeters,et al.  A Generic Training and Classification System for MIREX08 Classification Tasks: Audio Music Mood, Audio Genre, Audio Artist and Audio Tag , 2008 .

[35]  Björn W. Schuller,et al.  Tango or Waltz?: Putting Ballroom Dance Style into Tempo Detection , 2008, EURASIP J. Audio Speech Music. Process..

[36]  Malcolm Slaney,et al.  Acoustic Chord Transcription and Key Extraction From Audio Using Key-Dependent HMMs Trained on Synthesized Audio , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[37]  Björn Schuller,et al.  One Day in Half an Hour: Music Thumbnailing Incorporating Harmony- and Rhythm Structure , 2008, AMR 2008.

[38]  Björn W. Schuller,et al.  Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies , 2008, INTERSPEECH.

[39]  Emmanuel Dellandréa,et al.  What is the best segment duration for music mood analysis ? , 2008, 2008 International Workshop on Content-Based Multimedia Indexing.

[40]  Björn W. Schuller,et al.  Emotion recognition from speech: Putting ASR in the loop , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[41]  Björn W. Schuller,et al.  Audio chord labeling by musiological modeling and beat-synchronization , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[42]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Björn W. Schuller,et al.  The hinterland of emotions: Facing the open-microphone challenge , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[44]  Björn Schuller,et al.  Being bored? Recognising natural interest by extensive audiovisual integration for real-life application , 2009, Image Vis. Comput..

[45]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[46]  Björn W. Schuller,et al.  “The Godfather” vs. “Chaos”: Comparing Linguistic Analysis Based on On-line Knowledge Sources and Bags-of-N-Grams for Movie Review Valence Estimation , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[47]  Björn W. Schuller,et al.  On the Impact of Children's Emotional Speech on Acoustic and Language Models , 2010, EURASIP J. Audio Speech Music. Process..