Graph-Based Multimodal Music Mood Classification in Discriminative Latent Space

Automatic music mood classification is an important and challenging problem in the field of music information retrieval (MIR) and has attracted growing attention from variant research areas. In this paper, we proposed a novel multimodal method for music mood classification that exploits the complementarity of the lyrics and audio information of music to enhance the classification accuracy. We first extract descriptive sentence-level lyrics and audio features from the music. Then, we project the paired low-level features of two different modalities into a learned common discriminative latent space, which not only eliminates between modality heterogeneity, but also increases the discriminability of the resulting descriptions. On the basis of the latent representation of music, we employ a graph learning based multi-modal classification model for music mood, which takes the cross-modality similarity between local audio and lyrics descriptions of music into account for effective exploitation of correlations between different modalities. The acquired predictions of mood category for every sentence of music are then aggregated by a simple voting scheme. The effectiveness of the proposed method has been demonstrated in the experiments on a real dataset composed of more than 3,000 min of music and corresponding lyrics.

[1]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[2]  Lie Lu,et al.  Automatic mood detection and tracking of music audio signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  I. Peretz,et al.  Singing in the Brain: Independence of Lyrics and Tunes , 1998 .

[4]  J. Russell A circumplex model of affect. , 1980 .

[5]  Zhouyu Fu,et al.  A Survey of Audio-Based Music Classification and Annotation , 2011, IEEE Transactions on Multimedia.

[6]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jens Grivolla,et al.  Multimodal Music Mood Classification Using Audio and Lyrics , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[8]  Feng Su,et al.  Multimodal Music Mood Classification by Fusion of Audio and Lyrics , 2015, MMM.

[9]  Jeffrey J. Scott,et al.  MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEW , 2010 .

[10]  Pascale Fung,et al.  Multimodal music emotion classification using AdaBoost with decision stumps , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Jyh-Shing Roger Jang,et al.  Automatic Music Mood Classification Based on Timbre and Modulation Features , 2015, IEEE Transactions on Affective Computing.

[13]  Yi-Hsuan Yang,et al.  Toward Multi-modal Music Emotion Classification , 2008, PCM.

[14]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[15]  Andreas F. Ehmann,et al.  Lyric Text Mining in Music Mood Classification , 2009, ISMIR.

[16]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[17]  Mitsunori Ogihara,et al.  Identifying Accuracy of Social Tags by Using Clustering Representations of Song Lyrics , 2012, 2012 11th International Conference on Machine Learning and Applications.