CCA and a Multi-way Extension for Investigating Common Components between Audio, Lyrics and Tags.

In our previous work, we used canonical correlation analysis (CCA) to extract shared information between audio and lyrical features for a set of songs. There, we discovered that what audio and lyrics share can be largely captured by two components that coincide with the di- mensions of the core aect space: valence and arousal. In the current paper, we extend this work significantly in three ways. Firstly, we exploit the availability of the Million Song Dataset with the MusiXmatch lyrics data to expand the data set size. Secondly, we now also include social tags from Last.fm in our analysis, using CCA also between the tag space and the lyrics representations as well as between the tag and the audio representations of a song. Thirdly, we demonstrate how a multi-way ex- tension of CCA can be used to study these three datasets simultaneously in an incorporated experiment. We find that 2-way CCA generally (but not always) reveals certain mood aspects of the song, although the ex- act aspect varies depending on the pair of data types used. The 3-way CCA extension identifies components that are somewhere in between the 2-way results and, interestingly, appears to be less prone to overfitting.

[1]  M. Bradley,et al.  Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings , 1999 .

[2]  J. Stephen Downie,et al.  When Lyrics Outperform Audio for Music Mood Classification: A Feature Analysis , 2010, ISMIR.

[3]  Björn W. Schuller,et al.  Multi-Modal Non-Prototypical Music Mood Analysis in Continuous Space: Reliability and Performances , 2011, ISMIR.

[4]  Tijl De Bie,et al.  Eigenproblems in Pattern Recognition , 2005 .

[5]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[6]  Wolfgang Nejdl,et al.  Music Mood and Theme Classification - a Hybrid Approach , 2009, ISMIR.

[7]  Joan Serrà,et al.  Music Mood Representations from Social Tags , 2009, ISMIR.

[8]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[9]  Mert Bay,et al.  The 2007 MIREX Audio Mood Classification Task: Lessons Learned , 2008, ISMIR.

[10]  Tijl De Bie,et al.  Mining the Correlation between Lyrical and Audio Features and the Emergence of Mood , 2011, ISMIR.

[11]  Yajie Hu,et al.  Lyric-based Song Emotion Detection with Affective Lexicon and Fuzzy Clustering Method , 2009, ISMIR.

[12]  Roddy Cowie,et al.  What a neural net needs to know about emotion words , 1999 .

[13]  J. Russell Core affect and the psychological construction of emotion. , 2003, Psychological review.

[14]  Peter Knees,et al.  USING BLOCK-LEVEL FEATURES FOR GENRE CLASSIFICATION , TAG CLASSIFICATION AND MUSIC SIMILARITY ESTIMATION , 2010 .

[15]  Dan Yang,et al.  Disambiguating Music Emotion Using Software Agents , 2004, ISMIR.

[16]  Chong-sun Kim Canonical Analysis of Several Sets of Variables , 1973 .

[17]  Jens Grivolla,et al.  Multimodal Music Mood Classification Using Audio and Lyrics , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[18]  Hui He,et al.  Language Feature Mining for Music Emotion Classification via Supervised Learning from Lyrics , 2008, ISICA.

[19]  Geoffroy Peeters,et al.  MIREX-2010 SINGLE-LABEL AND MULTI-LABEL CLASSIFICATION TASKS : IRCAMCLASSIFICATION 09 SUBMISSION , 2010 .

[20]  J. Russell A circumplex model of affect. , 1980 .